《电脑永远胜利》封面

赞扬《电脑总是赢》

Praise for The Computer Always Wins

“真有趣!这本书文笔清晰,幽默风趣,适合各个层次的读者,从刚开始编程的青少年到像我这样的大学教授。”

“What fun! Written with clarity and humor, this book can be appreciated by readers at every level, from teenagers at the start of their coding journey to even college professors like me.”

——阿瑟·本杰明, 《数学的魔力》作者

—Arthur Benjamin, author of The Magic of Math

“真是赏心悦目。算法的乐趣一页一页地展现出来。”

“A real delight. The joy of algorithms shines through, page after page.”

——约翰·麦考密克, 《改变未来的九种算法》作者

—John MacCormick, author of Nine Algorithms That Changed the Future

对复杂主题的精彩介绍不仅能教会我们新的工具,还能开启全新的思维方式。《计算机永远胜利》是一本优秀的入门书,适合那些渴望学习算法思维、充满好奇心的读者。真是一大享受。如果我还是学生,一定会爱上它。

“Great introductions to complex topics not only teach new tools, but open up entirely new ways of thinking. The Computer Always Wins is that great introduction for the eager, curious reader who wants to learn to think algorithmically. A real treat. I would have loved it when I was a student.”

—Richard Rusczyk,《问题解决艺术》创始人兼首席执行官

—Richard Rusczyk, Founder and CEO, Art of Problem Solving

计算机总是赢

The Computer Always Wins

计算机总是赢

The Computer Always Wins

通过谜题和策略游戏有趣地介绍算法

A Playful Introduction to Algorithms through Puzzles and Strategy Games

艾略特·利希特曼

Elliot Lichtman

麻省理工学院出版社

The MIT Press

马萨诸塞州剑桥

Cambridge, Massachusetts

英国伦敦

London, England

麻省理工学院出版社

The MIT Press

麻省理工学院

Massachusetts Institute of Technology

马萨诸塞州剑桥市马萨诸塞大道 77 号,邮编 02139

77 Massachusetts Avenue, Cambridge, MA 02139

mitpress.mit.edu

mitpress.mit.edu

© 2025 艾略特·利希特曼

© 2025 Elliot Lichtman

保留所有权利。未经出版商书面许可,不得将本书的任何部分用于训练人工智能系统,也不得通过任何电子或机械手段(包括影印、录制或信息存储和检索)以任何形式复制本书的任何部分。

All rights reserved. No part of this book may be used to train artificial intelligence systems or reproduced in any form by any electronic or mechanical means (including photocopying, recording, or information storage and retrieval) without permission in writing from the publisher.

麻省理工学院出版社衷心感谢为本书草稿提供意见的匿名同行评审专家。学术专家的慷慨付出对于我们出版物的权威性和质量至关重要。我们衷心感谢这些未署名读者的贡献。

The MIT Press would like to thank the anonymous peer reviewers who provided comments on drafts of this book. The generous work of academic experts is essential for establishing the authority and quality of our publications. We acknowledge with gratitude the contributions of these otherwise uncredited readers.

本书由 New Best-set Typesetters Ltd. 采用 ITC Stone 和 Futura Std 字体排版。

This book was set in ITC Stone and Futura Std by New Best-set Typesetters Ltd.

美国国会图书馆出版编目数据

Library of Congress Cataloging-in-Publication Data

姓名:Lichtman,Elliot J.,作者。

Names: Lichtman, Elliot J., author.

标题:计算机总是赢:通过谜题和策略游戏对算法进行有趣的介绍/艾略特·利希特曼。

Title: The computer always wins : a playful introduction to algorithms through puzzles and strategy games / Elliot Lichtman.

描述:马萨诸塞州剑桥:麻省理工学院出版社,2025 年。| 包括参考书目和索引。

Description: Cambridge, Massachusetts : The MIT Press, 2025. | Includes bibliographical references and index.

标识符:LCCN 2024020063(打印)| LCCN 2024020064(电子书)| ISBN 9780262551694(平装本)| ISBN 9780262382304 (epub) | ISBN 9780262382311 (pdf)

Identifiers: LCCN 2024020063 (print) | LCCN 2024020064 (ebook) | ISBN 9780262551694 (paperback) | ISBN 9780262382304 (epub) | ISBN 9780262382311 (pdf)

主题:LCSH:计算机算法。| 棋盘游戏——数学。

Subjects: LCSH: Computer algorithms. | Board games—Mathematics.

分类:LCC QA76.9.A43 L54 2025(印刷版)| LCC QA76.9.A43(电子书)| DDC 005.13—dc23/eng/20240910

Classification: LCC QA76.9.A43 L54 2025 (print) | LCC QA76.9.A43 (ebook) | DDC 005.13—dc23/eng/20240910

LC 记录可访问https://lccn.loc.gov/2024020063

LC record available at https://lccn.loc.gov/2024020063

LC 电子书记录可访问https://lccn.loc.gov/2024020064

LC ebook record available at https://lccn.loc.gov/2024020064

10 9 8 7 6 5 4 3 2 1

10 9 8 7 6 5 4 3 2 1

欧盟产品安全和合规信息联系方式:mitp-eu-gpsr@mit.edu

EU product safety and compliance information contact is: mitp-eu-gpsr@mit.edu

d_r0

d_r0

内容

Contents

前言

Preface

如何使用本书

How to Use This Book

为什么要使用算法?

Why Algorithms?

章节摘要

Chapter Summaries

搜索和排序

Searching and Sorting

1 猜错答案

1 Guess Wrong Answers

2 未选择的路

2 The Road Not Taken

3 一步一个脚印

3 One Step at a Time

回合制策略游戏

Turn-Based Strategy Games

4 到底轮到谁了?

4 Whose Turn Is It Anyway?

5 行动更快

5 Move Faster

6 修剪树木

6 Pruning the Tree

随机模拟

Random Simulation

7 投掷飞镖

7 Throwing Darts

8支瞄准飞镖

8 Aiming Darts

9 用飞镖瞄准别人

9 Aiming Darts at Others

追踪与训练

Tracking and Training

10 石头,布……布

10 Rock, Paper . . . Paper

11个黑匣子

11 Black Boxes

12 减少遗憾

12 Minimizing Regret

后记

Afterword

Python 回顾

Python Review

参考

References

指数

Index

前言

Preface

无数网站、书籍和视频课程承诺教你编程。而且很多都真的做到了。它们会教你PRINTINPUTRANDOM之类的词汇,还会反复强调每个逗号、分号和制表符。但这些课程常常忽略一个简单的道理:学习一门新语言的乐趣在于运用它。

Countless websites, books, and video courses promise to teach you to code. And many will. They will teach you vocabulary like PRINT, INPUT, and RANDOM. They will hammer home every comma, semicolon, and tab. But these courses routinely forget a simple truth: the joy of learning a new language comes in the wielding of it.

因此,在本书中,我采用了不同的方法。我假设读者已经了解计算机编程的基本基础知识——PRINTINPUT命令;使用IFELSEFORWHILE的控制结构;以及数值变量、字符串变量、列表、数组和函数的基础知识——并且,我将仅使用这些基础知识,带领读者沉浸在计算机科学中那些完全神奇的部分中。

In this book, I therefore take a different approach. I assume that readers already know the bare-bones fundamentals of computer programming—the commands PRINT and INPUT; control structures that use IF, ELSE, FOR, and WHILE; plus the basics of numeric variables, string variables, lists, arrays, and functions—and, using only these foundational components, I immerse readers in the parts of computer science that are downright magical.

来教你的电脑在井字棋和四子棋等游戏中击败人类对手吧!惊叹于电脑快速解决Wordle和数独等益智游戏的策略。实现真正的机器学习,让电脑运用策略、评估结果,并根据先前的经验调整策略。

Come teach your computer to defeat human opponents in games like tic-tac-toe and Connect Four. Marvel at the strategies computers use to quickly solve puzzle games like Wordle and sudoku. Implement real machine learning where the computer plays a strategy, evaluates the result, and adjusts its approach based on that prior experience.

做到这一切,你学到的不仅仅是如何编程。我希望,你还能真正热爱编程。

Do all that, and you will learn more than just how to code. You will learn, I hope, how to truly love coding.

如何使用本书

How to Use This Book

本书面向已掌握计算机编程基础知识并准备学习更高级概念的读者。本书末尾附有 Python 回顾,总结了读者理解本书各代码片段和解决章末难题所需的一切知识。因此,如果您在阅读本书时遇到不熟悉的内容,无需担心。您需要了解的所有内容均已在正文中解释或包含在 Python 回顾中。

This book is designed for readers who already know the basics of computer programming and are now ready for more advanced concepts. That said, at the back of the book, I include a Python review that summarizes everything a reader might need in order to understand the book’s various code excerpts and solve the book’s end-of-chapter challenges. Thus, don’t worry if you come across something unfamiliar as you read this book. Everything you need to know is either explained in the text or covered in the Python review.

本书还穿插了一些代码摘录,旨在展示如何用简洁明了的代码来体现书中的众多理念。我还提供了大约二十几个代码链接,您可以从中获取本书所述各种程序的完整版本。要使用这些代码链接,请访问www.thecomputeralwayswins.com,按编号选择相应的代码链接,然后按照说明将代码复制到本地计算机或下载可用的文件。该网站还提供了大量且不断更新的支持材料,从新的编码挑战到讲解视频,以及各种现场和互动活动的信息。

The book is then peppered with code excerpts, each designed to show how the book’s many ideas can be captured in clean, simple code. I also include roughly two dozen CodeLinks from which you can retrieve complete versions of the various programs described in the book. To use those, visit www.thecomputeralwayswins.com, select the corresponding CodeLink by number, and follow the instructions to either copy the code to your local computer or download the available file. That same site is also home to a large and growing mix of support materials, from new coding challenges to explanatory videos and information about various live and interactive events.

最后,想要寻找良好编程环境的读者不妨从 python.org 下载免费软件,或使用 replit 或 Google 的 Colaboratory 等免费平台在线编程。更多详细说明,请访问www.thecomputeralwayswins.com

Lastly, readers looking for a good coding environment might want to download free software from python.org or code online using free platforms like replit or Google’s Colaboratory. For detailed instructions, visit www.thecomputeralwayswins.com.

为什么要使用算法?

Why Algorithms?

如果你正在读这本书,那么你几乎肯定是经典井字棋游戏中一位无敌的玩家。面对一个不老练的对手,你每次都能赢。面对一个实力强大的对手,你或许赢不了,但很可能永远不会输。而你做到了这一切……你是怎么做到的?你直觉上运用了哪些规则来放置每个X或每个O

If you are reading this book, you are almost certainly an unbeatable player at the classic game tic-tac-toe. Against an unsophisticated opponent, you win every time. Against a capable opponent, you might not win but you probably never lose. And you do all of this . . . how? What rules do you intuitively use to place each X or position each O?

要教计算机下棋,你可能首先会告诉计算机将它的标记放在它已经有两个标记的行、列或对角线上。也就是说,如果计算机下一步有能力获胜,你就会告诉计算机这样做。接下来,你会告诉计算机将它的标记放在它对手已经控制两个空间的行、列或对角线上。也就是说,如果对手下一步有能力获胜,你就会告诉计算机进行阻挡。从这里开始,会有一些可行的选择,但也许你会告诉计算机将它的标记放在它已经有一个标记而另外两个空间为空的行、列或对角线上。并且你可能会告诉计算机,作为一般规则,优先选择中心而不是角落,优先选择角落而不是任何其他开放空间。

To teach a computer to play, you would probably start by telling the computer to place its mark in any row, column, or diagonal where it already has two marks. That is, if the computer is in position to win on the next move, you would tell the computer to do so. Next, you would tell the computer to place its mark in any row, column, or diagonal where its opponent already controls two spaces. That is, if an opponent is in position to win on the next move, you would tell the computer to block. From there, there are a handful of plausible options, but perhaps you would tell the computer to place its mark in any row, column, or diagonal where it already has one mark and the other two spaces are empty. And you might tell the computer to, as a general rule, favor the center over the corners and the corners over any other open spaces.

如果你这样做,你的电脑就能玩得相当好。事实上,理论上井字棋棋盘有 2097 种可能的走法,这四条简单的规则就能在其中 1995 种棋盘上准确识别出正确的走法。无需任何花哨的技巧。电脑只需采取“如果……就赢”的策略,就能在 95% 的情况下选出最佳走法。可以,如果需要的话进行阻止,在第三个空间为空白的情况下创建连续两个组合,并且作为一般默认设置,优先选择中心而不是角落,优先选择角落而不是任何其他空间。

If you did this, the truth is that your computer would play the game reasonably well. In fact, there are 2,097 theoretically possible tic-tac-toe gameboards, and these four simple rules will reliably identify the right move on 1,995 of them. No need for anything fancy. The computer can pick the best move in 95 percent of cases simply by adopting the strategy of winning if it can, blocking if it needs to, creating two-in-a-row combinations where the third space is blank, and, as a general default, favoring the center over the corners and the corners over any other space.

然而,如果你教你的电脑用这种方法玩井字游戏,它每局都会输。为什么?因为优秀的人类玩家会利用这种随机策略失败的102种情况。到那时,没有人会庆幸你的电脑在其他1995种情况下都能表现良好。重要的是你的代码有一个盲点,而你的对手可以利用这个盲点来对付你。

Teach your computer to play tic-tac-toe this way, however, and the computer will lose every game. Why? Because a good human player will exploit the 102 situations where this haphazard approach fails. At that point, no one will celebrate the fact that your computer would have done just fine in any of 1,995 other situations. It will matter only that your code has a blind spot, and your opponents can use that blind spot against you.

这引出了两个重要的启示。首先,即使是简单的游戏,直觉规则也会失效,这一事实正是本书的出发点。问题在于,规则就像补丁:它们覆盖特定情况,就像一个简单的补丁可以用来覆盖特定的洞一样,但用规则或补丁覆盖大面积几乎总会留下意外的空隙。因此,一个有效的计算机算法必须更像一张毯子,覆盖所有可能的情况,包括那些罕见的、难以预见的,或者根本难以表达的情况。

This leads to two important implications. First, the fact that intuitive rules fail for even simple games is the launching point for this book. The problem is that rules are like patches: they cover a specific situation, much like a simple patch can be used to cover a specific hole, but covering a large area by rule or patch will almost always leave accidental gaps. An effective computer algorithm must therefore be more like a blanket, covering the full range of possible situations, including those that are rare, hard to foresee, or just plain hard to articulate.

其次,像“补丁”这样的简单规则却极其宝贵。例如,在井字棋游戏中,没有比连成三格更好的走法了。因此,只要有这种走法,一个编写良好的计算机程序就应该能够省去任何更复杂的算法,最终赢得胜利。

Second, simple rules, like patches, are nevertheless extremely valuable. In tic-tac-toe, for example, there is no better move than one that creates three in a row. So whenever that move is an option, a well-written computer program should cut short any more complicated algorithm and triumphantly take the win.

因此,在我们训练计算机玩各种日常游戏时,直观规则将发挥重要作用。但如果计算机想要战胜实力强大的玩家,我们还需要更多的东西。探索这些更多的东西正是本书激动人心的使命。

Intuitive rules, then, will serve an important role as we train the computer to play a wide variety of everyday games. But if the computer is to have any hope of winning against capable players, we will need something more. Exploring that something more is this book’s exciting mission.

章节摘要

Chapter Summaries

1 猜错答案

1 Guess Wrong Answers

您正在玩 Wordle,并且知道隐藏单词以字母a - b - l - e结尾。接下来的最佳猜测是table吗? CableSableFable?一台训练有素的计算机会告诉您不要猜这些,而是​​猜测看似无意义的单词scarf。本章将解释其中的原因,并解析当您的目标是大海捞针时可以使用的最佳算法之一。

You are playing Wordle, and you know that the hidden word ends with the letters a-b-l-e. Is table the best next guess? Cable? Sable? Fable? A well-trained computer will tell you not to guess any of these but instead to guess the seemingly nonsense word scarf. This chapter explains why, unpacking one of the best algorithms to use when your goal is to find the needle in some proverbial haystack.

2 未选择的路

2 The Road Not Taken

逃离迷宫的一个万无一失的方法是全面测试每一条路径,记录你现在的位置和曾经去过的地方。你沿着一条路走下去,记下所有可用的选项,然后继续前进,直到你逃离迷宫或遇到死胡同。如果你真的遇到了死胡同,没关系;因为你已经记录了所有剩余的选项,你可以原路返回,尝试其中一种。事实证明,这种方法不仅适用于迷宫。例如,如果你在玩数独,在给定的方格中可能存在多个数字。如何选择?先尝试选择一个,看看你是否能从那里开始填充剩余的网格而不会遇到死胡同,如果最初的选择行不通,就原路返回。本章将正式阐述这种漫游方法,并惊叹于它能够有效解决的无数谜题。

A foolproof way to escape a maze is to comprehensively test every path, keeping track of where you are and where you’ve been. You walk down a path, make note of every available option, and then continue forward until you either escape or hit a dead end. If you do hit a dead end, no problem; because you have kept track of all the remaining options, you can retrace your steps and try one of those. As it turns out, this is a promising approach for more than just mazes. If you are playing sudoku, for instance, there might be several numbers that could plausibly be placed in a given square. How to pick? Choose one tentatively, see if you can from there fill in the rest of the grid without hitting a dead end, and then retrace your steps if the original choice doesn’t work. This chapter formalizes this wandering approach and then marvels at the countless puzzles it can effectively solve.

3 一步一个脚印

3 One Step at a Time

如果你的汽车导航系统采用了上一章介绍的方法,你有时会踏上一段荒唐的旅程。你的车停在(比如)加州洛杉矶;你会问路去附近的一家餐馆;然后电脑会正确地告诉你一条可行的路线是从加利福尼亚开车到阿拉斯加,到德克萨斯,到俄亥俄,然后再回到那家洛杉矶餐厅。诚然,你最终会到达,但你会很饿。因此,本章将讨论一种竞争算法,它不仅承诺找到一条获胜路径,而且更有力地承诺找到最短路径。

If your car’s navigation system applied the approach introduced in the previous chapter, you would from time to time be sent on a ridiculous journey. Your car would be parked in (say) Los Angeles, California; you would ask for directions to a nearby restaurant; and the computer would correctly tell you that one workable path would be to drive from California to Alaska, to Texas, to Ohio, and then back to that Los Angeles restaurant. Admittedly, you would ultimately arrive, but you would be hungry. This chapter therefore considers a competing algorithm that does not simply promise to find a winning path, but more powerfully promises to find the shortest one.

4 到底轮到谁了?

4 Whose Turn Is It Anyway?

当你玩任何双人游戏时,你大概都会根据你对接下来几步棋的预测来选择下一步棋。比如在国际象棋中,当你考虑移动最左边的棋子时,你可能会提前思考对手会如何应对这一步棋,你会如何应对对手的应对,等等,甚至可能要考虑好几步棋。计算机也可以使用这种方法进行游戏,而且拥有巨大的优势:人类玩家通常只需提前思考几步棋就会筋疲力尽,而计算机理论上可以考虑所有可能的未来走法。

When you play any two-player game, you presumably pick your move based on what you think will happen in the moves to follow. In chess, for instance, as you think about moving your left-most pawn, you probably think ahead to what your opponent will do in response to that move, what you will do in response to your opponent’s response, and so on, perhaps several moves deep. Computers can play games using this exact approach but with a huge advantage: while a human player will typically be exhausted after thinking just a few moves ahead, a computer can theoretically consider every possible future move.

5 行动更快

5 Move Faster

上一章探讨了计算机通过逐一演算所有可能的未来反应来选择下一步行动的策略。这种方法虽然非常有效,但却非常缓慢。例如,要彻底玩完一局井字棋,计算机需要测试大约五十万次游戏交互才能完成第一步。如果对四子棋做同样的事情,计算机就无法再评估数不清的数万亿次交互。至于国际象棋?还是算了吧。本章迈出了解决这个问题的第一步,即限制计算机向前看的程度。计算机可能向前看两三步,但在本章中,计算机不再被允许将所有可能的游戏都玩到最后结果。

The prior chapter considers strategies where the computer picks its move by literally playing out every possible future response. That turns out to be a wildly effective but frustratingly slow approach. To exhaustively play out a game of tic-tac-toe, for example, the computer would need to test something on the order of half a million game interactions just to make its first move. Do the same thing for Connect Four and the computer is stuck evaluating untold trillions. And chess? Forget about it. This chapter takes a first step toward addressing this problem by limiting the extent to which the computer looks ahead. The computer might look two or three moves ahead, but in this chapter the computer is no longer allowed to play every possible game to its definitive conclusion.

6 修剪树木

6 Pruning the Tree

然而,我们还可以做得更好。我们目前最好的算法仍然会浪费大量时间去考虑完全不合理的走法。例如,即使计算机意识到除非计算机阻挡某个特定位置,否则对手将赢得比赛,我们目前的算法也会记录下这一信息,然后继续考虑其他选项。人类玩家不会做这样的事。一旦人类玩家找到了看似最佳的走法,人类玩家也能做到。本章探讨一些策略,使计算机能够以类似的方式减少不必要的分析,从而节省时间进行更艰难的选择。

We can do even better, however. Our best algorithm so far still wastes a lot of time considering completely implausible moves. For instance, even if the computer realizes that its opponent will win the game unless the computer blocks a particular spot, our current algorithm records that information but then continues to consider other options. A human player would do nothing of the sort. Once a human player finds what looks to be the best move, the human player makes it. This chapter looks at strategies that empower the computer to similarly cut wasteful analysis, saving time for more difficult choices.

7 投掷飞镖

7 Throwing Darts

当研究人员评估一种药物的疗效时,他们不会对每位患者进行测试。相反,他们会先测试少数患者,然后将结果推广到更适合的人群。事实证明,计算机也能做到非常类似的事情。例如,计算机无需严格分析两种潜在的走法,而是可以使用第一个选项随机模拟五十盘棋局,再使用第二个选项随机模拟五十盘棋局,然后选择平均表现更佳的走法。这项技术的威力可能会让你大吃一惊,因为它不仅速度惊人,而且精准度惊人。

When researchers are evaluating the efficacy of a medication, they don’t test the drug on every patient. Instead, they test a few patients, then generalize the results to fit the population. Computers, it turns out, can do something very similar. For instance, instead of analyzing two potential moves rigorously, a computer can randomly simulate fifty games using the first option, randomly simulate fifty more games using the second option, and then pick the move with the better average performance. The power of this technique might surprise you, in that it can be both remarkably fast and surprisingly accurate.

8支瞄准飞镖

8 Aiming Darts

上一章展示了随机抽样的威力。本章更进一步,从随机抽样转向策略抽样。例如,如果经过五十次随机模拟后,很明显一个举动很糟糕,而其他三个举动合理,那么纯随机方法会继续探索所有四个选项。然而,更好的策略是利用这些结果进行调整,将所有剩余的模拟集中在三个仍然合理的举动上,同时忽略那个显然是无效的举动。

The prior chapter demonstrates the power of random sampling. This chapter takes the next step, shifting from random sampling to strategic sampling. For instance, if after fifty random simulations it is clear that one move is terrible while three others are plausible, a purely random approach would continue to explore all four options. A better strategy, however, would take those results and adjust, focusing all remaining simulations on the three still-plausible moves while ignoring the move that is pretty clearly a dud.

9 用飞镖瞄准别人

9 Aiming Darts at Others

各位,我们先别伤人。不过,前两章将随机策略应用于本质上是单人游戏的游戏中。在这里,我们升级了算法,使其能够模拟双人互动。本章以一场爆米花式的对决结束:第 4、5、6 章中的双人算法与第 7、8、9 章中开发的双人算法展开较量。

Let’s not hurt anyone here, folks. But the two previous chapters apply random strategies to what are, in essence, one-player games. Here, we upgrade the algorithm so that it can be used to simulate two-player interactions. The chapter ends with a popcorn-worthy showdown: the two-player algorithms from chapters 4, 5, and 6 pitted against the two-player algorithms developed in chapters 7, 8, and 9.

10 石头,布……布

10 Rock, Paper . . . Paper

随机?拜托。玩石头剪刀布的时候,你可能会尝试随机出牌,但很可能会遇到一些奇怪的问题,导致游戏中断。也许你不愿意连续出两次剪刀,尽管一个纯粹随机的玩家会这么做。出奇地频繁。也许你潜意识里偏爱石头,或者输了之后会不自觉地改变选择,赢了之后却依然如此。正因如此,石头剪刀布并非真正的随机游戏,而是随机游戏加上模式识别。那么,谁才是模式识别之王呢?没错,你猜对了:你的电脑。

Random? Please. When playing rock-paper-scissors, you might try to make your choices randomly, but odds are you suffer from some idiosyncratic hiccup that throws off your game. Maybe you are reluctant to play scissors twice in a row, even though a purely random player would do exactly that surprisingly often. Maybe you subconsciously favor rock, or you absentmindedly change your choice after a loss but keep it the same after a win. Because of this, rock-paper-scissors isn’t really a game of random chance; it’s a game of random chance plus pattern recognition. And who, dare I ask, is the king of pattern recognition? Yes, you guessed it: your computer.

11个黑匣子

11 Black Boxes

到目前为止,在每一章中,我都明确地解释了计算机在做什么以及它为什么工作。然而,在计算机科学的前沿,存在着黑箱策略,这些细节几乎完全隐藏在人们的视野之外。程序员提供训练数据,在我们的例子中,这些数据是大量已经玩过的样本游戏。程序员构建灵活的数据结构和支持函数,使计算机能够根据数据测试不同的策略。但从这里开始,计算机会找到自己的方法,从过去的经验中学习,创造自己的未来战略。

In every chapter thus far, I have been explicit about exactly what the computer is doing and why it works. At the cutting edge of computer science, however, are black-box strategies where these details are almost completely hidden from view. The programmer provides training data, which in our case will be some large number of already-played sample games. And the programmer builds flexible data structures and supportive functions that empower the computer to test different strategies against the data. But from there the computer finds its own way, learning from the past to create its own strategic future.

12 减少遗憾

12 Minimizing Regret

人们通过与环境互动来学习。例如,幼儿学会走路的细节并非通过听取照顾者的详细说明,而是通过关注自己的磕磕碰碰。计算机也可以边走边学。因此,本章将通过探索一种这样的学习算法来总结我们的工作:计算机玩游戏,回顾哪些地方可能做得更好,并量化后悔,以便逐步制定出更有希望的解决方案。

People learn by interacting with their environment. Toddlers, for instance, figure out the details of walking not by listening to detailed instructions from their caregivers but by paying attention to their own bumps and bruises. Computers, too, can learn as they go. Thus, this chapter concludes our work by exploring one such learning algorithm: the computer plays the game, looks back to see where it might have made a better move, and quantifies that regret in order to gradually develop an even more promising solution strategy.

搜索和排序

Searching and Sorting

1

1

猜错答案

Guess Wrong Answers

我正在想一个从1到10的数字。我随机选了一个数字,当你猜的时候,我会诚实地告诉你你的猜测是正确、过高还是过低。你的第一个猜测应该是什么?

I am thinking of a number from 1 to 10. I chose my number randomly, and when you guess I will honestly tell you whether your guess is spot-on, too high, or too low. What should your first guess be?

因为我随机选择了数字,你可能会认为任何第一个猜测都一样正确。例如,如果你猜数字 2,你猜对的概率是十分之一。如果你猜数字 9,你猜对的概率也是十分之一。事实上,只要你猜的数字在 1 到 10 之间,你可能会认为任何猜测都差不多。

Because I chose my number randomly, you might think that any first guess is equally good. If you guess the number 2, for example, you have a 1-in-10 chance of being correct. If you guess the number 9, you again have a 1-in-10 chance of being correct. Indeed, as long as you guess a number between 1 and 10, you might think that any guess is just about the same.

但事实并非如此。

But no.

因为我还会告诉你你的猜测是过高还是过低,所以不同的猜测会带来截然不同的影响。举个极端的例子:假设你猜的是数字 1。如果你猜对了,你有十分之一的机会赢得游戏,也有十分之九的机会从九个大于 1 的数字中选择一个。如果你猜的是 7,你仍然有十分之一的机会猜对,但你也会有十分之六的机会被告知你的猜测过高,还有十分之三的机会被告知你的猜测过低。

Because I will also be telling you whether your guess is too high or too low, different guesses have very different implications. Take an extreme: Suppose you guess the number 1. Make that guess, and you have a 1-in-10 chance of winning the game and a 9-in-10 chance of being left to choose from among nine numbers, all of which are greater than 1. Compare that with a guess of 7. You would still have a 1-in-10 chance of being right, but now you would also have a 6-in-10 chance of being told that your guess is too high, and a 3-in-10 chance of being told that your guess is too low.

无论哪种方式,你都能获得大量关于正确答案的信息。如果你的猜测结果过高,你可以立即排除7、8、9和10这几个数字。如果猜测结果过低,(甚至更好)你可以排除1、2、3、4、5、6和7这七个数字。因此,即使数字 7 被证明是错误的,它仍然非常有用,因为它可以消除大量错误答案,从而使您的下一个猜测更有可能是正确的。

Either way, you would learn a ton of information about the right answer. If your guess turns out to be too high, you can suddenly eliminate from contention the numbers 7, 8, 9, and 10. Too low, and (even better) you can eliminate seven numbers, namely 1, 2, 3, 4, 5, 6, and 7. A guess of the number 7 is thus tremendously helpful even if it turns out to be wrong, in that it serves to eliminate a huge number of incorrect answers, making your next guess that much more likely to be right.

而且 7 甚至不是最佳选择。如上所述,通过猜测数字 7,你有 6/10 的机会将列表缩减为 6 个数字,有 3/10 的机会将列表缩减为 3 个数字。因此,如果猜错了 7,你平均只能从 4.5 个数字中进行选择。但是,你甚至可以更好地猜测范围中间的两个数字,即数字 5 或 6。例如,试试 5。同样,你有 1/10 的机会猜对,但这次你有 4/10 的机会猜得太高,有 5/10 的机会猜得太低。第一种可能性会排除六个数字,剩下四个。第二种可能性会排除五个数字,剩下五个。适当地进行加法和乘法,错误猜测 5 平均会给你留下 4.1 个选项,这个结果明显好于错误猜测 7 的 4.5 个选项。因此,猜测 6 也是一个同样好的选择,原因基本相同。

And 7 is not even the optimal pick. As noted above, by guessing the number 7, you give yourself a 6-in-10 chance of cutting the list down to six numbers and a 3-in-10 chance of cutting the list down to three. An incorrect guess of 7 thus leaves you to choose from 4.5 numbers on average. You can do even better, however, by guessing either of the two numbers in the middle of the range, namely the numbers 5 or 6. Try 5, for instance. Once more, you would have a 1-in-10 chance of being right, but this time you would have a 4-in-10 chance of being too high and a 5-in-10 chance of being too low. The first of those possibilities would eliminate six numbers and leave you with four. The second of those possibilities would eliminate five numbers and leave you with five. Adding and multiplying as appropriate, an incorrect guess of 5 would leave you with just 4.1 options on average, an outcome significantly better than the 4.5 associated with an incorrect guess of 7. Guessing 6 is then an equally good option, for essentially the same reasons.

我们可以将这类算法视为消除算法,其关键之处在于,这些策略不仅注重选出正确答案,还注重快速排除错误答案。而且,消除算法的威力远超乎你的想象。例如,使用 1 到 1,024 之间的数字来玩我们的猜谜游戏,如果一个算法总是选出中间数字,那么你最多猜十次就能赢。使用 1,000,000 以内的数字来玩这个游戏,如果选中间数字,那么你最多猜二十次就能赢。这些例子中的数学原理其实就是除法。每次你选出中间数字,要么赢,要么排除掉一半的剩余选项。

We can think of algorithms like this as elimination algorithms, where the key insight is that these strategies focus not solely on picking right answers but also on quickly eliminating wrong ones. And elimination is way more powerful than you think. Play our guessing game using the numbers 1 through 1,024, for example, and an algorithm that always picks the middle number guarantees you a win after no more than ten guesses. Play the game using numbers up to 1,000,000, and pick-the-middle guarantees you a win within twenty tries. The underlying math in these examples is just division. Every time you pick the middle number, either you win or you eliminate half the remaining options.

采用“猜中间”策略的玩家每次都会将数字列表减半。从1024个数字开始,玩家的最坏情况是第一次猜测后剩下512个数字,第二次猜测后剩下256个​​数字,第十次猜测后只剩下1个数字。

Players using a guess-the-middle strategy cut the list of numbers in half every move. Starting with 1,024 numbers, then, a player’s worst-case scenario is to have 512 numbers left after the first guess, 256 left after the second guess, and just 1 left after the tenth guess.

如果你一直跟着我读到这里,你可能开始意识到,你小时候玩的某个游戏比你当时意识到的要有趣得多。我想到的是:猜猜我是谁?

If you are with me thus far, you are probably beginning to realize that one of the games you played as a kid was a lot more interesting than you appreciated at the time. The game I have in mind: Guess Who?

游戏的玩法如下:你会拿到一个画有一定数量卡通人物的棋盘,你的对手会从中选择一个作为他们的秘密角色。你的对手也会拿到一个画有一定数量卡通人物的棋盘,你也从中选择一个作为你的秘密角色。然后,你们两人轮流问对方是非题,比赛看谁先猜出对方的秘密选择。我玩的是超级英雄版的游戏,所以小时候我会问“你的秘密英雄穿斗篷吗?”或者“你能看到你的秘密英雄的头发吗?”之类的问题。

Here’s how the game works. You are given a gameboard with some number of cartoon faces and your opponent chooses one of them as their secret character. Your opponent is also given a gameboard with some number of cartoon faces, and you, too, choose one to be your secret character. From there, the two of you take turns asking each other yes/no questions, racing to be first to identify the other person’s mystery pick. I had the superhero version of the game, so as a kid I would ask questions like “Does your secret hero wear a cape?” or “Can you see your secret hero’s hair?”

有趣的是,小时候我总是问一些像上面举的例子那样直截了当的问题,每当一个肯定的答案能排除四五个英雄时,我都会很高兴。但我们刚才看到,除以二是缩减名单的最快方法,这意味着我每次都应该问一些旨在创建两个大小相等的组的问题。怎么做呢?一个有效的方法是用“和”和“或”这样的词来细化我的问题。

What’s interesting is that, as a kid, I always asked straightforward questions like the examples given above, and I was pleased whenever an affirmative answer would eliminate, say, four or five heroes. But we just saw that dividing by two is the fastest way to whittle down a list, which means I should have been asking questions designed to create two equal-sized groups every time. How? One effective approach would have been to sharpen my questions using words like and and or.

例如,根据下面所示的样板,一个好的开场问题是问这位秘密英雄是否有披风护目镜,因为一半的角色满足这个限制,而另一半则不满足。如果答案是肯定的,那么最好再问一个复合问题,比如问这位秘密英雄是否同时拥有头盔护目镜。

Against the sample board shown below, for instance, a great opening question would be to ask whether the secret hero has either a cape or goggles, since half the characters satisfy that constraint but half do not. A positive response might best be followed by another compound question, maybe asking whether the secret hero has both a helmet and goggles.

在左侧的示例游戏板上,关于“披肩还是护目镜”的问题会淘汰一半的面孔。之后,关于“头盔和护目镜”的后续问题会再次将剩余面孔的数量除以二。

For the sample gameboard on the left, a question about “capes or goggles” eliminates half the faces. From there, a follow-up about “helmets and goggles” would again divide the number of remaining faces by two.

到目前为止,我们讨论的游戏都是相对容易实现淘汰的游戏。例如,在“猜数字”游戏中,我们必须跟踪一个有序的数字列表,并且必须找到列表中的中间数字。在“猜谁”游戏中,我们必须跟踪哪些卡通人物仍在竞争中,同时还要注意能够区分不同人物的相关特征。但淘汰算法有时需要大量的记录,以至于没有人类玩家能够跟踪所有数据,更不用说确定最佳走法了。正是在这些情况下,计算机才能显著超越人类对手。

The games we have considered thus far have all been games in which it was relatively easy to implement elimination. In guess-a-number, for instance, we had to keep track of an ordered list of numbers, and we had to find the middle number in that list. In Guess Who, we had to keep track of which cartoon characters remained in contention and also notice relevant characteristics that could differentiate the various characters. But elimination algorithms sometimes require so much recordkeeping that no human player can possibly track all the data, let alone identify the optimal move. It is in those instances that computers can meaningfully outperform their human counterparts.

从这个角度来思考一下游戏 Wordle。随机选择一个五个字母的单词并将其对玩家隐藏。然后玩家尝试猜出隐藏的单词,但有一个重要的限制:每个猜测本身必须是一个真正的五个字母的单词。每次猜测之后,玩家都会得到关于他们选择的字母的反馈。如果猜测中的字母以绿色突出显示,则该字母不仅在隐藏的单词中可以找到,而且在该单词中的位置与猜测的单词中的位置完全相同。如果一个字母以黄色突出显示,则该字母在隐藏的单词中,但在其他位置。最后,如果一个字母既不是绿色也不是黄色,则该字母根本不是隐藏单词的一部分。

Consider the game Wordle in this light. A five-letter word is chosen at random and hidden from the player. The player then attempts to guess the hidden word but with one important constraint: every guess must itself be a real five-letter word. After each guess, the player is given feedback about the letters they chose. If a letter in the guess is highlighted green, that letter is not only found in the hidden word but also found at exactly the same position in that word as it is in the guessed word. If a letter is highlighted yellow, the letter is in the hidden word but at some other spot. Lastly, if a letter is neither green nor yellow, the letter is not part of the hidden word at all.

下面显示了三个示例游戏。例如,在左边的第一个游戏中,我首先猜测的是单词teach 。我对字母e 的反应接近但不算太好,而我对字母a的位置的反应完全赞同,所以我知道隐藏的单词使用字母e ,中间有一个a ,并且不使用字母tch。这导致我第二次猜测单词glare,结果证明ea的位置都正确,然后由于新的阴影方块,还显示隐藏的单词在第一个或第二个位置有一个r。我的第三次猜测brake非常有帮助;它告诉我隐藏单词的第一个字母是br实际上是第二个字母。从那里开始,只剩下一个合理的选项可以猜,所以我选择了brave并在第四步赢得了游戏。

Three sample games are shown below. For instance, in the first game on the left, my opening guess was the word teach. I earned a close-but-not-quite reaction for the letter e and a full-throated endorsement for my placement of the letter a, so I knew that the hidden word uses the letter e, has an a in the middle, and does not use the letters t, c, or h. That led to my second guess, the word glare, which turns out to correctly place both the e and the a and then also reveal, thanks to the new shaded square, that the hidden word has an r in either the first or second position. My third guess of brake was then very helpful; it taught me that the first letter of the hidden word is b and that r is in fact the second letter. From there, there was only one plausible option left to guess, so I chose brave and won the game in that fourth move.

在第一场和第三场游戏中,我第四次就猜出了隐藏的单词。在第二场游戏中,我虽然很接近了,但还是花了第五次才最终猜对。

In the first and third games, I was able to guess the hidden word on my fourth try. In the second game, I came close but needed a fifth guess to finally get there.

玩 Wordle 的一种直观方法是将早期猜测的重点放在五个字母的单词中经常出现的字母上。使用这种方法,玩家可能会通过猜测单词raise来开始游戏,因为字母raise分别常出现在真正的五个字母的英语单词中。更复杂的方法可能还会考虑字母的位置,也许会选择像raves这样的单词,因为字母as不仅常用于五个字母的单词,而且还经常分别出现在第二个和最后一个位置。采用这种方法的玩家会逐字拼凑隐藏的单词。例如,如果对beast的开头猜测给出了关于字母e的位置和字母t的包含的有利信息,那么玩家可能会尝试在下一次猜测中尝试像tempofetch这样的单词,这两个单词将e保持在第二个位置,而t出现在第二位和第五位以外的位置。

One intuitive way to play Wordle is to focus early guesses on letters that tend to show up frequently in five-letter words. Using this approach, a player might open the game by guessing the word raise because the letters r, a, i, s, and e each commonly appear in real five-letter English words. A more sophisticated approach might consider letter position, too, perhaps picking a word like raves because the letters a and s are not only commonly used in five-letter words but also frequently appear in the second and final positions, respectively. A player adopting this approach would then piece together the hidden word letter by letter. For instance, if an opening guess of beast gave favorable information about the placement of the letter e and the inclusion of the letter t, the player might try, as their next guess, a word like tempo or fetch, two words that keep the e in the second spot and have a t somewhere other than the second and fifth positions.

但是现在尝试一种排除法。《纽约时报》每天都会发布新的 Wordle 谜题,从 2,315 个五个字母的选项数据库中选择隐藏的单词。在排除策略中,目标是在第一次猜测中排除这 2,315 个单词中的相当一部分。例如,如果我们选择单词apple并且最终没有绿色和黄色方框,那么反馈将从列表中排除 1,888 个单词:每个使用aple的单词。如果我们选择单词apple并且最终得到绿色的l和黄色的e ,那么这次的反馈将排除更令人印象深刻的 2,297 个单词:每个使用a 的单词、每个使用p 的单词、每个在第四位有除l之外的字母的单词,以及每个没有一个e 的单词。

But now try an elimination approach. The New York Times publishes new Wordle puzzles every day, choosing hidden words from a database of 2,315 five-letter options. In an elimination strategy, the goal would be to eliminate some substantial number of those 2,315 words in the first guess. For example, if we were to pick the word apple and end up with no green and no yellow boxes, that feedback would turn out to eliminate 1,888 words from the list: every word that uses an a, a p, an l, or an e. If we pick the word apple and end up with a green l and a yellow e, the feedback this time would eliminate an even more impressive 2,297 words: every word that uses an a, every word that uses a p, every word that has a letter other than l in the fourth position, and every word made without even one e.

即使是随机猜测,也能迅速排除大量之前可能出现的单词。例如,对“brav”(勇敢)一词的猜测反馈清楚地表明,词库中的 2,305 个单词不正确,因为这些单词会得出不同的反馈。

Even random guesses can quickly eliminate large numbers of previously possible words. The feedback from a guess of brave, for example, makes clear that 2,305 words in the word bank are not right because those words would have led to different feedback.

上图展示了另外四个例子,将随机猜测的“勇敢”“便宜”“有趣”“怪异”与假设的隐藏词“伟大”进行了比较。结果令人欣喜:几乎每次猜测都会排除数百个潜在单词,使得下一次猜测变得容易得多。这表明Wordle算法是可行的。我们将从《纽约时报》所有符合条件的Wordle单词的完整列表开始。然后,计算机会随机猜测其中一个单词。根据反馈,计算机会排除明显的失败者,然后从剩余的单词中随机抽取一个新的猜测。计算机会一遍又一遍地重复这个过程,直到隐藏的单词被提出作为猜测,或者是队列中唯一剩下的单词。这样就行得通了。事实上,当我测试这种方法时,计算机只用了五次就猜出了我隐藏的单词“幸运”,用了四次就猜出了我隐藏的单词“绿色”。然后,我测试了整个2315个单词的数据库,随机方法能够识别隐藏的单词大约三分之一的时间里,它只需要少于五次猜测就能找到一个单词,除了 187 个单词之外,其他单词的猜测次数都少于七次。这算不上天才,但也还不错。

The chart above captures four additional examples, comparing the random guesses brave, cheap, funny, and weird against the hypothetical hidden word great. The promising payoff: nearly every guess eliminates hundreds of potential words, making the next guess that much easier. This suggests a plausible Wordle algorithm. We would begin with the full New York Times list of all eligible Wordle words. The computer would then guess one of those words at random. Based on the feedback, the computer would eliminate the clear losers and then draw a new random guess from among the remaining words. The computer would repeat that process again and again until the hidden word was either proposed as a guess or was the only word left in the queue. And that would work fine. Indeed, when I tested this approach, the computer guessed my hidden word lucky in just five tries, and it guessed my hidden word green in four. I then tested the entire 2,315-word database, and the random approach was able to identify the hidden word in fewer than five guesses roughly one third of the time, and it needed fewer than seven guesses for all but 187 words. That’s not genius performance, but it’s not bad.

当我的隐藏词是lucky时,计算机的随机猜测顺序依次为zonalslimywrylyleftylucky。当我的隐藏词是green时,计算机会依次猜出broodcrepegreetgreen

When my hidden word was lucky, the computer’s random guesses were, in order, zonal, slimy, wryly, lefty, and lucky. When my hidden word was green, the computer guessed brood, then crepe, then greet, then green.

但正如我们从“猜数字”和“猜谁”游戏中学到的,当计算机策略性地选择猜测时,它们的表现会更好。为了思考如何在这种情况下采取策略,让我们用一个人工缩短的允许单词列表来玩这个游戏,这样我们就能真正看到计算机在进行不同猜测时会发生什么。因此,我们不用像《纽约时报》那样使用全部 2,315 个单词来创建其每日谜题,而是考虑一个简化版的游戏,其中隐藏的单词保证是下面列出的 18 个单词之一。

But as we learned from both guess-a-number and Guess Who, computers do even better when they pick their guesses strategically. To consider how to be strategic in this context, let’s play the game with an artificially short list of permissible words so that we can really see what happens as the computer makes different guesses. Thus, instead of using all 2,315 words that the New York Times itself uses to create its daily puzzle, consider a simplified version of the game where the hidden word is guaranteed to be one of the eighteen words listed below.

如果我们第一个猜的是adept这个词,而隐藏的单词也是adept,我们显然会赢。假设隐藏的单词是随机选择的,那么这种情况发生的概率是 1/18。按照同样的逻辑,有 1/18 的概率隐藏的单词是after,有 1/18 的概率隐藏的单词是agent,有 1/18 的概率隐藏的单词是actor,等等,对于所有这 18 个单词而言。下表列出了我们应该期待的反馈。例如,如果我们猜测adept并且隐藏的单词是greattreatwheat,我们将收到的反馈是et都放置正确并且a是正确的但是位置错误。发生这种情况的概率为 3/18 — — 隐藏的单词是greattreatwheat — — 如果我们确实得到了该反馈,那么在下一次猜测中我们只需要区分这三个词。

If our first guess is the word adept and the hidden word is also adept, we will obviously win the game. Assuming that the hidden word is picked at random, there is a 1-in-18 chance of that happening. By the same logic, there is a 1-in-18 chance that the hidden word is after, a 1-in-18 chance that it is agent, a 1-in-18 chance that it is actor, and so on, for all eighteen words. The chart below captures the feedback we should therefore expect. For example, if we guess adept and the hidden word is great, treat, or wheat, the feedback we will receive will be that the e and t are both properly placed and the a is correct but in the wrong position. There is a 3-in-18 chance of that happening—the chance that the hidden word is great, treat, or wheat—and, if we do get that feedback, in our next guess we would need to distinguish between only those three words.

按照同样的逻辑,如果我们猜“adept”,而隐藏的单词是catereateneaterextrahatertakentakerwater,我们会看到答案表明aet都在谜题中,只是在其他位置。这种情况有八种可能出现,如果出现,下一步我们只需要区分这八个可能的单词。下表秉承这种精神列出了完整的列表。如果我们猜“adept”,以下是我们可能看到的所有答案,以及这些答案对于仍有多少单词存在争议的含义。

By the same logic, if we guess adept and the hidden word is cater, eaten, eater, extra, hater, taken, taker, or water, we will see a response indicating that the a, e, and t are all in the puzzle but in other positions. There are eight ways that can happen, and, if it does, in our next move we will have to distinguish between only those eight possible words. The chart below runs through the full list in this spirit. If we guess adept, these are all the possible responses we might see, and what those responses mean in terms of how many words would still be in contention.

这张图表展示了我们猜测单词adept时可能看到的所有反馈。有些反馈能立即让我们明白隐藏的单词是什么。其他反馈虽然不那么具有决定性,但仍然有助于显著缩小可能单词的范围。

This chart shows the full range of feedback we might see if we guess the word adept. Some of the feedback immediately makes clear what the hidden word must be. Other feedback is less conclusive but still helps to significantly narrow the list of plausible words.

我们可以为列表中的所有十八个单词制作类似的图表。例如,考虑一个专注于猜测extra 的图表。如果反馈表明eta都在隐藏词中使用,但在其他位置,那么我们就可以将剩余单词的列表缩小到只有五个:adeptagenttakentweakwheat。如果我们收到反馈,表明et都正确放置,而a是隐藏词的一部分,但位于其他位置,我们就会立即知道必须吃掉隐藏词,因为这是列表中唯一满足这些约束并且不包含xr 的单词。

We can make similar charts for all eighteen words in the list. Consider, for example, a chart focused on the guess extra. If the feedback were to indicate that the e, t, and a are all used in the hidden word but in other locations, that would allow us to narrow the list of remaining words to just five: adept, agent, taken, tweak, and wheat. Were we instead to receive feedback indicating that the e and t are both correctly placed while the a is part of the hidden word but located elsewhere, we would immediately know that the hidden word must be eaten because that is the only word on the list that satisfies those constraints and does not include either an x or an r.

如果我们制作了全部十八张图表,那么我们就可以选出最佳猜测。考虑单词taken。对于adept ,我们发现的最坏结果是反馈让我们只剩下八个单词。相比之下,对于took的猜测,最坏的情况是反馈认可使用字母tae,但在其他位置,这留下五个单词:adeptavertextragreatwheat。如果我们的目标是最小化最坏情况,那么taken比adept更好,因为它在最坏情况下留下的单词更少。沿着这个思路,更好的选择是aftercatereaterhatertakertaperwater ,因为对于每一个单词,最坏的情况只剩下四个可能的单词。

If we made all eighteen charts, we could then pick the optimal guess. Consider the word taken. For adept, the worst outcome we found was feedback that left us with eight remaining words. For a guess of taken, by contrast, the worst case turns out to be feedback that endorses the use of the letters t, a, and e, but in other positions, which leaves five words in play: adept, avert, extra, great, and wheat. If our goal is to minimize our worst case, taken is thus a better guess than adept because its worst case leaves us with fewer words. Even better options along these lines are the words after, cater, eater, hater, taker, taper, and water because, for each of those, the worst case leaves us with only four possible words.

我们不要想着最坏的情况,而是可以关注猜测错误后预计剩余单词的平均数量。对于adept,我们有 8/18 的概率剩下 8 个单词,3/18 的概率剩下 3 个单词,2/18 的概率剩下 2 个单词,4/18 的概率只剩下一个单词。适当加乘后,这意味着在错误猜测adept后,我​​们平均会剩下 4.5 个单词。对taken进行同样的计算,我们发现,平均而言,猜测错误taken后需要考虑的单词有 2.83 个。对列表中所有 18 个单词进行同样的计算,结果表明最佳选择是watertaper,因为这两个单词每次都平均只给我们留下 2.17 个单词用于下一次猜测。

Instead of focusing on the worst case, we could instead focus on the average number of words expected to remain after an incorrect guess. For adept, we had an 8-out-of-18 chance of being left with eight words, a 3-out-of-18 chance of being left with three words, a 2-out-of-18 chance of being left with two words, and a 4-out-of-18 chance of being left with just one word. Adding and multiplying as appropriate, this suggests that we would, on average, be left with 4.5 words after wrongly guessing adept. Running the same math for taken tells us that, on average, an incorrect guess of taken will leave us with 2.83 words to consider. And running this same calculation for all eighteen words in the list ends up suggesting that our best bets are the words water and taper because each of those leaves us with an average of merely 2.17 words for our next guess.

等等。准备好了吗?到目前为止,我们的分析只考虑了可能最终成为隐藏单词的猜测。也就是说,在这个版本的游戏中,我们有18个单词,我们只考虑了这18个单词作为可能的猜测。虽然听起来可能很疯狂,但其他单词也可能是更好的猜测。例如例如,考虑单词there。由于there不在我们的十八个可能单词列表中,所以肯定不会有正确答案。然而,如下图所示,与单词there相关的最坏情况结果是列表中只剩下三个单词,而剩余单词的平均数量会低得惊人,只有 1.67 个。这意味着,无论我们的目标是最小化最坏情况还是最小化剩余单词的平均数量, there(一个不在我们可能答案列表中的单词)都可能是比adepttaken或列表中的任何单词更好的猜测。

But hold on. Ready for this? Our analysis thus far has only considered guesses that could turn out to be the actual hidden word. That is, we have eighteen words in this version of the game and we have only considered using those eighteen words as possible guesses. But crazy as it might sound, it is possible that some other word will be an even better guess. For example, consider the word there. Because there is not in our list of eighteen possible words, there will definitely not be the right answer. However, as the chart below shows, the worst-case outcome associated with the word there would be a list of three remaining words, and the average number of words remaining would be a shockingly low 1.67. This means that there—a word not in our list of possible answers—is likely a better guess than adept, taken, or indeed any word in our list, regardless of whether our goal is to minimize the worst case or to minimize the average number of words remaining.

这很令人惊讶。直觉敏锐的人类玩家永远不会猜到“那里”,因为在这个例子中,没有一个可能的获胜词。但在淘汰算法中,我们的目标之一是尽可能多地排除无用词,所以突然之间,像“那里”这样的词很可能是我们的最佳选择,即使我们知道不可能通过猜测这个特定的词来赢得游戏。

This is surprising. An intuitive human player would never guess there because there is not a possible winning word in this example. But in an elimination algorithm, part of our goal is to eliminate as many duds as we can, and so all of a sudden a word like there might well be our best option even though we know we cannot possibly win the game by guessing that particular word.

“ there”这个词不可能是正确答案,因为它没有收录在词库中。然而,初步猜测,无论目标是最小化最坏情况还是最小化平均情况,它都是一个绝佳的选择。

The word there cannot be the correct answer because it is not listed in the word bank. As a first guess, however, it turns out to be a fabulous choice regardless of whether the goal is to minimize the worst case or minimize the average.

章节挑战

Chapter Challenge

本章的挑战是使用我们刚刚探索过的消去算法编写一个 Wordle 解题程序。为了帮助你入门,本 CodeLink 将带你查看一些示例 Python 代码,用于运行一个简单版本的游戏。其中有一个名为WORD LIST数组,存储了数千个允许的五个字母的单词。还有一个名为HIDE W ORD()的函数,它会随机选择一个单词作为隐藏词;一个名为GUESS W ORD()的函数可以让计算机进行猜测;还有一个名为SCORE G UESS()的函数,它会评估你的猜测并在屏幕上显示反馈。然而,在示例代码中,计算机的猜测是完全随机的。也就是说,计算机会从列表中随机选择一个单词进行猜测,如果该单词不是隐藏词,则随机选择其他单词,因为它没有从之前的尝试中吸取任何教训。你的任务是用一个实现经过深思熟虑的消除过程的函数替换这个随机函数,该过程可以基于剩余单词的平均数量或最坏情况分析。

Your challenge this chapter is to write a Wordle solver using the elimination algorithm we just explored. To get you started, this CodeLink will take you to sample Python code that plays a simple version of the game. There is an array called WORDLIST that stores a few thousand permissible five-letter words. There is a function called HIDEWORD() that randomly chooses one of the words to be the hidden word; a function called GUESSWORD() that allows the computer to guess; and a function called SCOREGUESS() that evaluates the guess and provides feedback on the screen. In the sample code, however, the computer’s guesses are completely random. That is, the computer randomly chooses a word from the list, guesses it, and, if that word is not the hidden word, the computer randomly chooses some other word, having learned nothing from its prior tries. Your job is to replace this random function with a function that implements a thoughtful elimination process, perhaps based on the average number of words remaining or on worst-case analysis.

编写代码时,请思考是否可以对算法进行其他改进。例如,假设猜测单词mouse会将catchmatchlock作为剩余的可能选择,而猜测单词party会将northfirstaargh作为剩余的可能选择。在我们迄今为止的工作中,我们将这两种结果视为等同的,因为在这两种情况下都剩下三个单词需要考虑。但是catchmatchlocknorthfirstaargh更差,因为在下一次猜测中区分catchmatchlock比区分northfirstaargh更困难。是否有可行的算法改进来解释某些单词列表实际上比其他单词列表更容易评估这一事实?

As you code, think about whether there are other improvements you can make to the algorithm. For example, suppose that guessing the word mouse would leave the words catch, match, and latch as the remaining possible choices, whereas guessing the word party would leave the words north, first, and aargh. In our work thus far, we have treated these two outcomes as equivalent because in both cases there are three words left to consider. But catch, match, and latch are worse words than north, first, and aargh because it is more difficult in the next guess to distinguish between catch, match, and latch than it would be to distinguish between north, first, and aargh. Is there a plausible improvement to our algorithm that would account for the fact that some lists of words are actually easier to evaluate than others?

最后,如果你喜欢Wordle,不妨考虑一下如何修改算法来玩其他版本的游戏。例如,在Fibble中,逐字母反馈每轮都会包含一个谎言,例如玩家可能会被告知字母f在正确的位置,但实际上隐藏的单词中根本没有使用f。或者,我最喜欢的Wordle变体:在Absurdle中,秘密单词会随着回合而变化。新单词始终与已经给出的反馈一致,但除此之外,新单词的选择会尽可能地增加猜测玩家的难度。

Lastly, if you enjoy Wordle, consider how you might change the algorithm to play other versions of the game. For instance, in Fibble, the letter-by-letter feedback includes one lie per round, such that a player might be told that the letter f is in the right spot when, in fact, f is not used in the hidden word at all. Or, my favorite Wordle variation: in Absurdle, the secret word changes from round to round. The new word is always consistent with the feedback already given, but beyond that the word is chosen to make things as hard as possible for the guessing player.

2

2

未选择的路

The Road Not Taken

你迷失在迷宫里。你只看到两条路,一条向东,另一条向南。你知道迷宫有一个出口,只要你能找到它。时钟开始滴答作响。你应该先走哪个方向?

You are lost in a maze. You see only two paths, one heading east and the other heading south. You know that there is an exit to the maze, if only you can find it. A clock starts to tick. Which direction should you go first?

因为你不知道出口在哪儿,所以你的第一步不可避免地会是随机的。假设你抛硬币决定往东走。这时,你看到另一个岔路口,这次一条路向南,另一条路向东。你又一次没有任何信息。你随机地继续向东走。死胡同。你沿着你的脚步回到之前的岔路口。上次你向东走。这次,也许向南?唉。又是死胡同。所以你回到之前的岔路口。当你第一次进入迷宫时,你向东走。这样就只剩下一个选择:向南。你深吸一口气,向南走。

Because you have no information about where the exit might be, your first step will be unavoidably random. Suppose you flip a coin and decide to go east. As you do, you see another fork, this time with another path heading south and another path heading east. Again you have no information. You randomly keep going east. Dead end. You trace your steps back to the prior fork. Last time, you went east. This time, maybe south? Sigh. Another dead end. So back you go to the prior fork. When you first entered the maze, you went east. That leaves just one option to explore: south. You take a deep breath and go south.

上面实施的直观策略是计算机科学家称之为回溯的方法。其目的是测试一个动作,看看会发生什么,如果结果不尽如人意,就回到同一个路口再试一次。在上面的例子中,我们在遇到第一个死胡同后原路返回,然后在遇到第二个死胡同后再次原路返回。最终,我们一路返回到起点,这让我们尝试了最终被证明是最好的选择:在第一个岔路口,向南走。

The intuitive strategy implemented above is an approach that computer scientists call backtracking. The idea is to test a move, see what happens, and then, if the outcome is disappointing, trace back to that same intersection and try again. In the example above, we backtracked after hitting our first dead end, and then we backtracked again after hitting our second dead end. We ultimately backtracked all the way to our starting point, which allowed us to try what turned out to be the best option: at the first fork, go south.

每当我们走到死胡同,我们就会通过逆转最近的举动来回溯。当我们在某个路口用尽所有选择却仍未成功时,我们也会回溯。

Whenever we reach a dead end, we backtrack by reversing our most recent move. We also backtrack whenever we have exhausted all the choices available at a given intersection without succeeding.

回溯是一种强大的策略,但它也面临一个重大挑战:追踪所有数据。我们迷宫的例子在这个维度上并不特别繁重,因为路上只有两个岔路,每个岔路只有两个选项。因此,我们相对容易记住我们做出的选择、它们的顺序以及我们未探索的路径列表。但尝试将回溯算法应用于九乘九的数独棋盘,事情很快就会变得非常混乱。

Backtracking is a powerful strategy, but it comes with one substantial challenge: keeping track of all the data. Our maze example was not particularly onerous along this dimension in that there were only two forks in the road and each fork had only two options. It was therefore relatively easy to remember the choices we made, their order, and the list of paths we left unexplored. But try to apply a backtracking algorithm to a nine-by-nine sudoku board and things get pretty messy, pretty fast.

如果您不熟悉数独,我来简单介绍一下。数独是一种在九乘九的网格上进行的游戏,玩家需要用 1 到 9 的数字填满每个空格,并且确保任何行、任何列,以及棋盘上任何较小的三乘三方格中,数字都不会重复。例如,在此处显示的示例棋盘中,左上角的三乘三方格中已经有数字 2、6、8 和 9,这意味着该三乘三方格中剩余的空格都无法再用这些数字填充。该方格第一行的空位受到进一步的限制,因为该行中的数字 1、2、6 和 8 已经被使用,而该列中的数字 2、4、8 和 9 也已经被使用。因此,该空格只能用三个可用数字之一填充:3、5 或 7。

In case you are unfamiliar, sudoku is a game played on a nine-by-nine grid where the player is asked to fill every empty space with a number 1 through 9 such that no digit repeats in any row, in any column, or within any of the board’s smaller three-by-three squares. In the example board shown here, for instance, the three-by-three square in the upper left corner already has the numbers 2, 6, 8, and 9, which means that none of the remaining spaces in that three-by-three square can be filled with those digits. The empty slot in that square’s first row is further limited by the fact that, in that row, the digits 1, 2, 6, and 8 have already been used, and, in that column, the digits 2, 4, 8, and 9 have also already been used. The result is that this space can only be filled with one of three available digits: 3, 5, or 7.

如果我们想在数独棋盘上回溯,这张图只展示了我们需要绘制的决策树的一小部分。完整的决策树需要很多页纸和很多咖啡。

The figure captures only a tiny sliver of the decision tree we would need to draw if we wanted to backtrack our way through a sudoku board. The full tree would require many pages and lots of coffee.

你我都可以回溯解决这个难题。我们可以先测试我们一直关注的那个位置上的数字 3,然后在右边第二个空格中填入数字 4 或数字 5。然后,我们可以用铅笔在其他空着的位置填入其他数字,测试每个选项,直到我们考虑的组合要么产生获胜结果,要么产生非法移动。如果走到尽头,我们就回溯:移除最近添加的数字,看看是否有其他未经测试的数字可能有效。如果有效,我们就继续前进。如果无效,我们就移除该数字和之前的数字,然后再试一次。

You and I could backtrack our way through this puzzle. We might start by testing the number 3 in the slot on which we have been focused, and then placing either the number 4 or the number 5 in the empty space two to the right. We could then pencil in other digits for the other open slots, testing every option until the combination under review yields either a winning outcome or an illegal move. In the event of a dead end, we would backtrack: we would remove the most recently added digit and see if some other, untested digit might work. If so, we would move forward again. If not, we would remove both that digit and the prior digit and try, try again.

这个过程可以用“决策树”来表示,其中每个节点代表当时的棋盘,每条线代表在网格中添加某个特定新数字的决策。没错,对于你我来说,这样的决策树会让人不知所措。事实上,下图仅展示了我们之前描述的决策树的几个分支,就已经乱得令人难以置信。因此,本章将从一种有助于追踪信息的编码概念开始。这种新的概念是一种称为递归函数的函数。

This process can be represented using a “decision tree” where each node represents the then-current board and each line represents the decision to add some specific new number to the grid. And yes, for you and me, a tree like this would be overwhelming. Indeed, the figure below shows just a few branches of the tree we have been describing, and already it’s a staggering mess. We therefore begin this chapter with a coding concept that will help us track information. This new construct is a type of function called a recursive function.

想象一下,你正在排队,等着在当地一家商店买冰淇淋。你很不耐烦,想知道前面有多少人,但又担心如果走出队伍去数人,自己就会失去位置。于是,你凑近前面的顾客,问他前面有多少人。这位顾客当然不知道,但他现在也既不耐烦又好奇,于是他又凑近前面的顾客,问了同样的问题。就这样,这个过程一直持续到……排队的第二个人问了第一个人这个问题。第一个人正准备享用一杯三球巧克力圣代,于是轻而易举地回答了“零”。这个答案让第二个人算出他前面有一个人,这又让第三个人算出他前面有两个人,以此类推,直到队伍的另一端。最终,你前面的人告诉你他的号码,你加一,就得到了你的号码。

Imagine that you are standing in line waiting your turn to purchase ice cream from a local shop. Impatient, you would like to know how many people are ahead of you in line, but you worry that you will lose your place if you step out of line to count the people. So you lean forward to the customer in front of you and ask how many people are in front of them. This customer has no idea, of course, but they, too, are now both impatient and curious, so they lean forward and ask the same question to the person in front of them. And thus the process continues down the line until the second person in line asks this question to the very first person. That person is about to indulge in a triple scoop chocolate sundae and thus easily responds “zero,” which is a response that lets the second person calculate that there is one person in front of them, which in turn lets the third person calculate that there are two people in front of them, and so on, back up the line. Ultimately, the person before you tells you their number, you add one, and you know yours.

本例中使用的策略是递归的,因为我们解决计数问题的方式是,不断创建同一个问题的更小版本,然后用每个小问题来解答之前稍大一些的版本。更具体地说,我们教会了队伍中的每个人思考:“数清我前面的每个人,是一项艰巨的任务。但如果我前面的人能够应对稍微小一点的挑战,数清前面排队的人,我就能轻松地在总数上加一,从而解决问题。” 唯一的特殊情况是,有幸排在队伍最前面的人。这个人能看到冰淇淋,所以可以自信地一边伸手去拿勺子,一边说“零”。

The strategy used in this example is recursive in that we are solving the counting problem by creating smaller and smaller versions of that same problem and then using each of those smaller problems to answer the slightly larger version that came before it. More concretely, we taught every person in the line to think, “Counting every person in front of me would be a big job. But if the person directly in front of me were to tackle the slightly smaller challenge of counting the line in front of them, I could easily solve my problem by adding one to the total.” The only special case involved the person who was lucky enough to be at the front of the line. That person could see the ice cream and so could confidently say “zero” while reaching for a spoon.

下面的代码实现了这种方法。代码使用一个数组来表示排队等待冰淇淋的人数。函数COUNTLINE ( ) 然后是一个递归函数,其输入是调用该函数的人的索引。函数由此遵循两条路径之一。如果调用该函数的人正直视冰淇淋,则函数返回零,因为从该人的角度来看,没有线。然而,如果该人没有直视冰淇淋,则该函数递归调用自身,使用相同的函数来解决稍微小一点的问题,即从前一个索引号的人的角度来看,计算线的数量。然后主程序调用该函数,启动整个过程。

The code below implements this approach. The code uses an array to represent the line of people waiting for ice cream. The function COUNTLINE() is then a recursive function that takes, as input, the index of the person who is calling the function. The function from there follows one of two paths. If the person who called the function is looking directly at the ice cream, the function returns a zero because, from that person’s perspective, there is no line. If the person is not looking directly at the ice cream, however, the function recursively calls itself, using the same function to solve the slightly smaller problem of counting the line from the perspective of the person one index number ahead. The main program then calls the function, starting the process.

这里需要注意什么?首先,COUNT LINE ()会调用自身。也就是说,这个函数有时会调用自身来解决一个稍微小一些的类似问题,从而解决当前问题。其次,也是至关重要的一点是,COUNT LINE ()有时除了调用自身之外还可以执行其他操作。这一点至关重要,因为函数的分析必须在某个时刻结束。必须存在某种情况,使得函数能够回答任何待处理的问题,而无需再提出另一个更小的问题。否则,递归代码将永远运行下去。

What’s important to notice here? First, COUNTLINE() calls itself. That is, this function solves the problem at hand by sometimes calling itself to solve a slightly smaller, similar problem. Second, and critically important, COUNTLINE() can sometimes do something other than call itself. This is crucial because at some point the function’s analysis must end. There must be some situation where the function can answer whatever question is pending without spinning up yet another, smaller question. Without this, recursive code would go on forever.

第三,也是我们开始考虑递归的原因:COUNT LINE ()会跟踪所有混乱的数据,当我们第一次考虑使用回溯来解决像九乘九数独这样的大挑战时,我们担心这些数据。下图显示了如何操作。第一次调用COUNT LINE ()时,计算机会在内存中创建一个空间来记录调用该函数时位置参数设置为 0 的事实。当COUNT LINE ()再次调用自身时,计算机会在内存中创建一个新空间;它记录COUNT LINE (1)COUNT LINE (0)调用的事实;并且它进一步记录COUNT LINE (0)正在等待COUNT LINE (1)的响应的事实。当COUNT LINE (1)稍后调用COUNT LINE (2)时,内存中会创建另一个空间,计算机再次跟踪新调用和前一个调用之间的关系。

Third, and the reason we started thinking about recursion: COUNTLINE() keeps track of all the messy data that we were worried about when we first thought about using backtracking to address a big challenge like our nine-by-nine sudoku puzzle. The graphic below shows how. The first time we call COUNTLINE(), the computer creates a space in memory to record the fact that the function was called with the position parameter set to 0. When COUNTLINE() then calls itself again, the computer creates a new space in memory; it records the fact that COUNTLINE(1) was called by COUNTLINE(0); and it further records the fact that COUNTLINE(0) is waiting for a response from COUNTLINE(1). When COUNTLINE(1) later calls COUNTLINE(2), yet another space is created in memory, and yet again the computer keeps track of the relationship between the new call and the previous one.

图中每个方块代表计算机内存中的一个独立位置。重点是:每次我们的程序调用COUNTLINE ( )函数时,计算机都会分别记录有关调用的信息、变量值以及返回结果时应该执行的操作。

Each square in this diagram represents a separate location in the computer’s memory. The point: every time our program calls the COUNTLINE() function, the computer separately records information about the call, the variable values, and what should happen when there is an answer to return.

简而言之,递归函数隐式地绘制了我们在本章前面绘制的决策树。每次触发递归函数时,计算机都会在内存中找到一个新位置并记录所有关键信息。计算机会跟踪相关变量的值。计算机还会跟踪被调用函数和调用函数之间的关系。然后,计算机可以利用这些记录来完成即使是漫长而复杂的决策过程……这正是回溯所需要的。

In short, recursive functions implicitly draw the decision trees that we drew earlier in the chapter. Every time a recursive function is triggered, the computer finds a new location in memory and writes down all the critical information. The computer keeps track of the values of the relevant variables. The computer keeps track of the relationship between the called function and the calling function. And the computer can then use those records to move through even a long and convoluted decision-making process . . . which is exactly what we need for backtracking.

应用所有这些,我们现在可以编写一个递归回溯函数。假设我们在代码的其他地方创建了一个简单的 5×5 迷宫,行编号为 0、1、2、3 和 4,列编号也为 0、1、2、3 和 4。迷宫中的开放空间用数字 0 表示,墙壁用数字 1 表示。进一步规定计算机只能从MAZE[0][0]位置进入这个迷宫,计算机的目标是找到一条通往出口的路径,出口位置是MAZE[4][4] 。最后,我们将变量CURRENT R OWCURRENT C OLUMN定义为用于存储(等待)计算机当前位置(根据行号和列号)的变量。

Applying all that, we can now write a recursive backtracking function. Let’s imagine that, elsewhere in our code, we have created a simple five-by-five maze with rows numbered 0, 1, 2, 3, and 4, and columns also numbered 0, 1, 2, 3, and 4. Open spaces in the maze are represented by the number 0, and walls are represented by the number 1. Let’s further stipulate that the computer can only enter this maze at position MAZE[0][0] and the computer’s goal is to find a path to the exit, which is position MAZE[4][4]. Lastly, let’s define the variables CURRENTROW and CURRENTCOLUMN to be variables that store (wait for it) the computer’s current position in terms of its row number and column number.

COUNT L INE()中,只要我们到达线头,就会触发非递归情况。也就是说,在COUNT L INE()中,当我们最终到达第一个人并询问线的位置时,我们不需要触发递归。他们只是回答而已;没有必要提出一些新的、更小版本的问题。对于我们的迷宫代码,有两种情况我们不需要触发额外的递归。一种是我们已经到达出口,可以直接宣布胜利。另一种是我们已经完全测试了特定位置,并且知道没有办法从该特定位置到达出口。在后一种情况下,我们将回溯,而不是开始另一次递归搜索。

In COUNTLINE(), our nonrecursive case triggered whenever we reached the front of the line. That is, in COUNTLINE(), we did not need to trigger recursion when we finally reached that first person and asked about the line. They simply answered; there was no need to spin up some new, smaller version of the problem. For our maze code, there will be two circumstances where we will not need to trigger additional recursion. One will be where we have arrived at the exit and can simply declare victory. The other will be where we have fully tested a specific position and know that there is no way to reach the exit from that particular spot. We will backtrack in the latter case, rather than starting another recursive search.

至于递归分支,在COUNT L INE()中,我们的递归是线性的;如果我们尚未到达线头,就向前移动一位,并从那里开始计算线的长度。这里,由于我们正在探索一个二维迷宫,因此每次递归调用都需要考虑四种可能的移动方式:向上移动一行、向下移动一行、向左移动一列以及向右移动一列。请注意,我们必须检查这些移动是否合法。例如,如果向上移动一行意味着撞到墙壁或超出迷宫边界,计算机就不能这样做。但只要这样做是合法的,我们的递归步骤就需要考虑这四种选择。

As for the recursive prong, in COUNTLINE() our recursion was linear; if we were not already at the front of the line, we moved forward one position and counted the line from there. Here, because we are exploring a two-dimensional maze, every recursive call needs to consider four possible moves: move up one row, move down one row, move left one column, and move right one column. Note that we will have to check whether those moves are legitimate. The computer cannot move up one row, for instance, if that would mean crashing into a wall or stepping outside the boundaries of the maze. But whenever it is legal to do so, our recursive step will need to consider those four options.

现在,最后一个问题。在本章开头我们对回溯的直观解释中,我们从未绕圈行走。例如,当我们到达第一个岔路口时,我们考虑了向东移动的可能性,也考虑了向南移动的可能性,但当时我们并没有考虑掉头立即向西返回的可能性。是的,我们最终确实回到了西边的点。但我们是在完全探索完通往东方的原始路径之后才这样做的。这被证明是一个重要但不言而喻的限制。如果我们允许自己向西走,我们最终可能会无休止地徘徊,向东、向西、向东、向西、再向东,而最终却无法探索任何新的方向。

Now, one final wrinkle. In our intuitive explanation of backtracking at the start of the chapter, we never moved in a circle. For example, when we reached the first fork in the road, we considered the possibility of moving east and also the possibility of moving south, but we did not at that moment consider the possibility of turning around and immediately heading back west. Yes, we did ultimately revisit points to the west. But we did so only after we had fully finished exploring the original path to the east. This turns out to be an important but unspoken constraint. Had we allowed ourselves to go west, we might have ended up wandering endlessly, moving east and west and east and west and east again, all without exploring any new ground.

我们可以在代码中解决这个问题,方法是在访问位置时划掉它们。例如,假设我们站在MAZE[2][2]处,即将探索我们正下方的位置,即MAZE[3][2]。我们的递归函数会告诉计算机从MAZE[3][2]开始,并从那里考虑向上移动一行、向下移动一行、向左移动一列、向右移动一列的可能性。然而,这一次,我们实际上不希望计算机考虑向上移动一行,因为这可能会触发永无止境的向上/向下/向上/向下循环。因此,在进行递归调用之前,我们需要将MAZE[2][2]标记为已访问,以便计算机知道将其视为墙。然后,当计算机完成这部分探索并准备回溯到某个先前的决策点时,我们可以取消标记此空间,重置迷宫,以便计算机可以轻松地测试可能最终也经过此位置的其他路径。

We can address this issue in our code by crossing out positions as we visit them. Suppose, for instance, that we are standing at MAZE[2][2] and are about to explore the position immediately below us, which is MAZE[3][2]. Our recursive function would tell the computer to start at MAZE[3][2] and from there consider the possibility of moving up one row, down one row, left one column, and right one column. This time, however, we actually do not want the computer to consider moving up one row because that would risk triggering a never-ending up/down/up/down cycle. So before making the recursive call, we need to mark MAZE[2][2] as visited, so that the computer knows to treat it like a wall. Then, later, when the computer finishes this part of its exploration and is ready to backtrack to some prior decision point, we can unmark this space, resetting the maze so that the computer can easily test other paths that might end up passing through this position, too.

示例代码如下。摘录本身仅展示了递归函数,而非完整程序。完整代码可在 CodeLink 中找到,其中包括辅助函数IS L EGAL(),用于检查建议位置是否为墙壁、先前访问过的位置或迷宫边界外的空间;以及辅助函数IS E XIT(),用于检查计算机是否已到达位置MAZE[4][4]

Sample code follows. The excerpt itself shows only the recursive function and not the full program. The full code, however, is available at the CodeLink, including the helper function ISLEGAL(), which checks that the proposed position is not a wall, a previously visited location, or a space outside the maze’s boundaries; and the helper function ISEXIT(), which checks whether the computer has reached position MAZE[4][4].

现在到了令人震惊的一步:一个几乎相同的函数可以用来玩数独。这次我们需要一些新的游戏专用辅助函数。具体来说,我们需要一个IS P OSSIBLE()函数,如果提议的数字可以合法地放置在提议的空间中,则返回 TRUE;我们还需要一个IS F ILLED ()函数,如果已没有剩余的空间可填充,则返回TRUE。但递归函数本身遵循我们在迷宫中使用的确切模式。如果棋盘已填满或先前放置的数字无效,则不会进行递归调用。否则,函数将移动到下一个空白处,测试从 1 到 9 的每个数字,并且,对于任何看似可行的选项,从那里尝试递归地解决剩下的谜题。

Now for the mind-blowing step: an almost identical function can be used to play sudoku. We need a few new game-specific helper functions this time. Specifically, we need an ISPOSSIBLE() function that returns TRUE if a proposed number can legally be placed in the proposed space, and we need an ISFILLED() function that returns TRUE if there are no spaces left to fill. But the recursive function itself follows the exact pattern we used for the maze. There is no recursive call if either the board is filled or the previously placed numbers do not work. Otherwise, the function moves to the next blank space, tests every number 1 through 9, and, for any seemingly workable option, tries from there to recursively solve the rest of the puzzle.

示例代码如下所示。同样,摘录仅展示了递归函数,其余代码可通过 CodeLink 获取。该版本还包含一些打印语句,可以清晰地显示计算机在执行多层递归时的具体操作。

Sample code appears below. Once more, the excerpt shows only the recursive function, but the rest of the code is available by following the CodeLink. That version also includes some print statements that will make clear exactly what the computer is doing as it runs through its many layers of recursion.

再次令人震惊的是,我们的迷宫和数独函数如此相似。它们是完全不同的游戏,但回溯就是回溯。因此,这两个函数都至少建立了一些不需要递归调用的条件。它们都建立了递归案例,旨在探索游戏中所有可能的下一步行动。并且它们都采用了将大问题重新定义为一系列类似的小问题的策略,从而将其简化。在迷宫代码中,每次递归调用处理的都是一个“较小”的问题,因为每次递归调用都会在多走一步后对迷宫进行求值。在数独代码中,每次递归调用的“较小”之处在于,它求值的棋盘上少了一个空格。这些特性定义了递归,无论应用程序是什么。总有至少一种计算机可以立即解决的情况,另外还有至少一种递归情况可以构建完全相同问题的较小版本。有了这种编码结构,我们几乎可以回溯解决任何类似迷宫的谜题。

Again, the shocking thing here is that our maze and sudoku functions are so very similar. These are completely different games, but backtracking is backtracking. So both functions establish at least some conditions where no recursive call is required. Both establish recursive cases designed to explore every possible next move in the game. And both use the strategy of simplifying a big problem by reframing it as a series of similar, smaller ones. In the maze code, each recursive call addresses a “smaller” problem in the sense that each evaluates the maze after one more step has been taken. In the sudoku code, each recursive call is “smaller” in the sense that it evaluates a board that has one fewer blank space. These features define recursion no matter the application. There is always at least one case that the computer can immediately solve, plus at least one recursive case that frames a smaller version of the exact same problem. And with that coding construct, we can backtrack our way through almost any maze-like puzzle.

章节挑战

Chapter Challenge

在国际象棋中,皇后是游戏中最强大的棋子之一。其他棋子在单回合移动方式和距离上都受到各种限制。相比之下,皇后可以水平移动,沿其所在行,垂直移动,或沿两条相关对角线中的任意一条进行对角移动。此外,皇后可以自由地沿其路径的任意位置停止,也可以继续移动直至到达棋盘边缘或碰到另一个棋子(在这种情况下,皇后会吃掉该棋子并占据其在棋盘上的位置)。

In chess, the queen is one of the game’s most powerful pieces. Other pieces suffer various limitations in both how and how far they can move in any single turn. The queen, by contrast, can move horizontally across its row, vertically along its column, or diagonally along either of the two relevant diagonals. Moreover, the queen is free to stop anywhere along its path or to continue all the way until it either reaches the edge of the gameboard or hits another piece, in which case the queen captures that piece and takes its position on the board.

在“八皇后”游戏中,玩家需要在一个标准的八乘八棋盘上摆放八个皇后,并且确保任何皇后在下一步棋中都无法吃掉其他皇后。例如,在下图中,左侧的摆放方式并不成功,因为除了其他问题之外,其中两个皇后位于同一条对角线上。中间的摆放方式同样不成功,这次是因为两个皇后位于同一条垂直线上。相比之下,右侧的摆放方式则成功:任何皇后在下一步棋中都无法吃掉其他皇后。

In the game Eight Queens, a player is challenged to place eight queens on a standard eight-by-eight chess board such that no queen can take any other queen on its next move. For example, in the diagrams that follow, the placement on the left is not successful because, among other problems, two of the queens are on the same diagonal. The placement in the middle is also not successful, this time because two of the queens are on the same vertical. The placement on the right, by contrast, is a successful placement: no queen can take any other queen on the next move.

在八乘八的棋盘上摆放八个皇后有超过四十亿种不同的方法。然而,其中只有九十二种方法能够成功解决八皇后难题。本章的挑战是编写一个程序,使用回溯法找到这九十二种解决方案。CodeLink 将帮助你入门。我会在其中生成棋盘,随机摆放皇后,并测试随机方案是否有效。不出所料,我几乎从未找到过可行的方案。模式。你能用回溯函数替换我的随机函数,以便彻底搜索九十二种获胜布局吗?

There are over four billion different ways to place eight queens on an eight-by-eight chess board. Only ninety-two of them, however, are successful solutions to the Eight Queens puzzle. Your challenge in this chapter is to write a program that uses backtracking to find those ninety-two solutions. The CodeLink will get you started. There, I generate the gameboard, randomly place the queens, and test to see whether the random solution works. Unsurprisingly, I almost never find a workable pattern. Can you replace my random function with a backtracking function that exhaustively searches for the ninety-two winning layouts?

在编写代码时,请思考一下你的代码是否真的测试了所有四十亿种可能的模式(我希望不是),或者代码是否以某种方式在早期就排除了大量不可能的选项。同时,思考一下这段代码与迷宫或数独程序的区别。在迷宫或数独程序中,只要计算机找到一个可行的解决方案,递归就会结束。为了让计算机找到所有可能的答案,我们需要在这里做哪些修改?如果你遇到困难,可以尝试对迷宫代码进行相应的修改,因为迷宫代码更短,因此更容易调试。

As you code, think about whether your code actually tests all four billion possible patterns (I hope not) or if instead the code somehow eliminates large numbers of impossible options early on. Also think about the difference between this code and our maze or sudoku programs. In those other two, recursion ends as soon as the computer finds one workable solution. What changes do we need to make here in order for the computer to find every possible answer? If you get stuck, try to make a corresponding change to the maze code, which is shorter and thus easier to debug.

3

3

一步一个脚印

One Step at a Time

在游戏 Word Ladder 中,玩家会得到一个起始词和一个结束词,并被要求通过提出一系列中间词来连接这两个词,其中每个中间词都与前一个词只有一个字母不同。例如,从headtail 的梯子可能首先会带我们到heal,然后是teal,然后是tell,然后是tall,最后到达tail 。从catsdogs的旅程可能会更快,中间可能只停留在cots,然后是dots。起初,这个游戏可能听起来像另一个迷宫,结构类似于我们上一章考虑的游戏。事实上,想象一张地图,上面每个城市都有一个名字,名字足够相似的城市通过道路连接起来。这样看来,Word Ladder 实际上只是另一个需要探索的迷宫,挑战在于仅使用真实的道路从某个起始城市导航到某个终止城市。

In the game Word Ladder, a player is given a starting word and an ending word and asked to connect the two by proposing a sequence of intermediate words where each differs from the prior by only a single letter. The ladder from head to tail, for instance, might take us first to heal, then teal, then tell, then tall, before finally reaching tail. The voyage from cats to dogs might be faster, maybe with intermediate stops at only cots, then dots. At first, this game might sound like yet another maze, similar in structure to the games we considered last chapter. Indeed, picture a map where every city has a name, and cities with sufficiently similar names are connected by a road. Framed this way, Word Ladder really is just another maze to explore, where the challenge is to navigate from some starting city to some ending city using only real roads.

那么,计算机可以使用回溯法来玩这个游戏也就不足为奇了。例如,为了找到从headtail 的路径,计算机可以任意选择一个可能与head连接的词,比如held 。然后,计算机可以任意选择一个与held连接的词,比如weld。然后,计算机就可以继续漫无目的地游荡,始终记录所做的选择以及未来可能需要进一步探索的选择。计算机会一直这样游荡,直到它偶然发现了希望的tail词,或者走进了死胡同。这里意味着可用的单词用完了,从而触发正常的回溯过程:删除最近添加的单词,回溯一个交集,然后递归地探索剩余的、以前被忽略的替代方案。

No surprise, then, that a computer can use backtracking to play this game. To find a path from head to tail, for instance, the computer could arbitrarily choose a word that can plausibly connect to head, maybe held. Then the computer could arbitrarily choose a word that connects to held, like weld. And on the computer could go, wandering aimlessly, always keeping track of both the choices made and those that might in the future need to be further explored. The computer would wander in this manner until it either stumbled onto the hoped-for word tail or hit a dead end, which here would mean running out of permissible words and thus triggering the normal backtracking process: remove the word most recently added, back up one intersection, and then recursively explore the remaining, previously ignored alternatives.

那么,为什么 Word Ladder 需要一个新的章节和一种新的方法呢?因为像这样的回溯会产生长得可笑的梯子。当然,计算机可能纯粹靠运气,通过从 good 到 gold 到 mold 到 mole 到 mode 到 code 这条非常有效的路径将 good 连接到code 计算机可能通过迂回旅程挣扎也许goodhoodhooklooklockdockduckdumbdustbustbushbusyburyburnborncorncore,然后,最后,幸运的是,到达code。回溯不能保证效率。它甚至不尝试寻找短路径。回溯只承诺计算机将一直寻找,直到找到一些可行的路线或用尽所有可能性。

So why does Word Ladder warrant a new chapter and a new approach? Because backtracking like this can produce comically long ladders. Sure, the computer might by sheer luck connect good to code by way of the very efficient path of good to gold to mold to mole to mode to code. But the computer might instead flounder through a more roundabout journey, perhaps from good to hood to hook to look to lock to dock to duck to dusk to dust to bust to bush to busy to bury to burn to born to corn to core and then, finally, mercifully, to code. Backtracking makes no promise as to efficiency. It does not even try to find short paths. Backtracking promises only that the computer will keep looking until it either finds some workable route or has exhausted every possibility.

谢天谢地,汽车导航系统不是这样工作的。想象一下,你把车停在纽约市,询问去附近棒球场的路线。导航系统可以正确地指出你可以先到乔治亚州,然后是内华达州,然后是蒙大拿州,然后是俄亥俄州,接着往南到佛罗里达州,然后沿着一条风景优美的路线穿过新罕布什尔州、特拉华州和罗德岛州,之后再开六个小时到达棒球场的停车场。而且计算机的判断是正确的;那条路线可行。但是,如果导航系统仅仅因为能够详尽无遗地、坚持不懈地搜索路径而闻名,那它就没什么用处了。它们很有用,因为它们通常会找到最短的可用路径(在汽车的例子中,以距离或时间为衡量标准)。

Car navigation systems do not work this way, and thank goodness. Imagine that you are parked in New York City, asking for directions to a nearby baseball stadium. The navigation system could correctly point out that you can travel first to Georgia, then Nevada, then Montana, then Ohio, then down to Florida, then along a wonderfully scenic route through New Hampshire, Delaware, and Rhode Island, and then drive six more hours to the stadium’s parking lot. And the computer would be right; that path would work. But navigation systems would not be useful if their claim to fame was merely that they exhaustively and relentlessly searched for a path. They are useful because they typically find the shortest available path, measured in the car example by either distance or time.

那么,在“词阶梯”的语境下,计算机如何完成这项更细致的任务呢?假设我们要构建一个从“bold”“hope”的阶梯,而我们的词库只有六个单词:boldhopeholdholehometold。第一步,让我们找出列表中所有可以与起始词“bold”连接的单词。考虑到词库规模较小,我们最终只剩下两个选项:holdtold

So how might a computer accomplish this more nuanced task in the context of Word Ladder? Suppose that we are trying to build a ladder from bold to hope, and that our word bank has just six words: bold, hope, hold, hole, home, and told. As a first step, let’s identify every word in the list that can be permissibly connected to our starting word, bold. Given the small word bank, we end up with just two options: hold and told.

由此,我们知道只有两个以粗体开头的双字梯子。更重要的是,我们也知道,没有任何一个双字梯子能够到达“希望”。更长的梯子或许最终能带我们到达那里,但用两个词来说,我们能做到的最好的就是到达“抱住”“告诉”。所以我们继续前行。

From this, we know that there are only two two-word ladders that begin with bold. More importantly, we also know that there is no two-word ladder capable of reaching hope. Longer ladders might ultimately get us there, but, in two words, the best we can do is reach either hold or told. So we keep going.

在第一步中,我们从单词bold开始扩展。这一次,让我们将所有两字阶梯扩展为相关的三字选项。阶梯bold - hold变成两个可能的阶梯:bold - hold - holebold - hold - told。阶梯bold - told只剩下一个:bold - told - hold 。此时,我们已经生成了以bold开头的三字阶梯的完整列表,并且我们现在知道没有连接boldhope的三字路径。在三个词中,我们所能做的最好的就是到达holetoldhold。所以我们再次循环。

In the first step, we expanded from the word bold. This time, let’s take all of our two-word ladders and expand them into their associated three-word options. The ladder bold-hold becomes two possible ladders: bold-hold-hole and bold-hold-told. The ladder bold-told becomes just one: bold-told-hold. At this point we have generated the full list of three-word ladders that begin at bold, and we now know that there are no three-word paths that connect bold to hope. In three words, the best we can do is reach hole, told, or hold. So we cycle through again.

梯子“bold - hold - hole”可以扩展为“bold - hold - hole - home”,甚至令人兴奋的是,可以扩展为“ bold - hold - hole - hope”。我们就此打住吧。我们已经确定,没有三个字的梯子能够到达“hope”,而现在我们找到了一个四字梯子。因此,我们可以满怀信心地汇报,我们找到的不仅仅是一个可能的梯子,而且实际上是最短的梯子:boldholdholehope

The ladder bold-hold-hole can be expanded to bold-hold-hole-home and also, excitingly, to bold-hold-hole-hope. And we can stop there. We have already established that there are no three-word ladders capable of reaching hope, and we have now found a four-word ladder that does the job. Thus we can report back with confidence that we have found not just one possible ladder but in fact the shortest possible ladder: bold, hold, hole, hope.

注意这种方法与回溯的不同之处。在回溯中,计算机随机选择一条路径并一直走到终点。例如,如果计算机从bold开始,它可能会尝试bold - hold,然后bold - hold - hole,然后bold - hold - hole - home等等,直到达到hope或用尽允许的单词。因此,计算机无法知道任何得到的梯子是否真的是好的选择。计算机最多只知道从boldhope至少存在一条合法的路径。相比之下,我们的新策略保证计算机会找到最短的选项。毕竟,这种方法首先构建所有两个单词的梯子,然后是所有三个单词的梯子,然后是所有四个单词的梯子,依此类推,直到找到一条获胜路径。因为每个可能的梯子都会根据其长度按顺序考虑,最短路径永远不会被忽视,而且实际上总是第一个被确定的获胜路径。

Notice how this approach differs from backtracking. In backtracking, the computer picks a path at random and pursues it all the way to the end. Had the computer started with bold, for example, it might have tried bold-hold, then bold-hold-hole, then bold-hold-hole-home, and so on, until it either reached hope or ran out of permissible words. The computer as a result has no way of knowing whether any resulting ladder is in fact a good choice. At best, the computer knows only that there exists at least one legal path from bold to hope. Our new strategy, by contrast, guarantees that the computer will find the shortest option. After all, this approach builds all the two-word ladders first, then all the three-word ladders, then all the four-word ladders, and so on, until a winning path is found. Because every possible ladder is considered in order based on its length, the shortest path is never overlooked and indeed will always be the first winning path identified.

计算机科学家将此描述为深度优先搜索和广度优先搜索之间的区别。我们的回溯算法就是一种深度优先搜索,因为在深度优先算法中,计算机会尽可能地探索每条路径,只有在充分探索完第一条路径后才会考虑其他选项。相比之下,本章介绍的方法则是广度优先搜索,因为计算机会先考虑所有双字选项,然后再考虑所有三字选项,直至找到成功的结果。

Computer scientists describe this as the difference between depth-first and breadth-first search. Our backtracking algorithm was a type of depth-first search because, in that algorithm, the computer pursued each path as far as it would go, considering other options only after the first was fully explored. The approach introduced in this chapter, by contrast, is breadth-first search because the computer considers the full breadth of two-word options before moving on to consider the full breadth of three-word options, all the way to a successful outcome.

广度优先搜索的主要缺点是什么?它很难找到任何解决方案。想象一下,一个单词对,其最终的“胜利阶梯”需要十步,而词库包含 4,000 个四个字母的单词。为了找到最终的“胜利阶梯”,执行广度优先搜索的计算机首先需要识别出,比如说,20 个以……开头的双字阶梯。以相关的起始词和词库中的其他词结尾。然后,由于每个梯形图可能与大约二十个其他单词连接,计算机需要创建大约 400 个三字梯形图。按照这种模式,计算机将创建大约 8000 个四字梯形图、160000 个五字梯形图、320 万个六字梯形图,以及后来令人惊叹的 256 亿个九字梯形图。没错,这个过程最终会确定最短的梯形图。但构建所有这些先前的梯形图有时会被证明是不切实际的。

The main drawback to breadth-first search? It can be slow to find any solution at all. Imagine a word pair where the winning ladder requires ten steps, and where the word bank consists of 4,000 four-letter words. To find the winning ladder, a computer implementing breadth-first search would first need to identify the, say, twenty two-word ladders that begin with the relevant starting word and end with some other word in the bank. Then, because each of those ladders might plausibly connect to twenty or so other words, the computer would need to create roughly 400 three-word ladders. Following this pattern, the computer would create something like 8,000 four-word ladders, 160,000 five-word ladders, 3.2 million six-word ladders, and later a breathtaking 25.6 billion nine-word options. Yes, this process would in the end identify the shortest possible ladder. But building all those prior ladders will sometimes prove impractical.

为了实际演示此过程,让我们编写代码。我们可以使用列表WORDS作为我们的词库,并且可以定义函数BREADTH F IRST S EARCH()来接受两个必要的输入:一个START W ORD和一个END W ORD。在函数内部,我们可以维护一个名为LADDERS的数组,我们将在其中存储任何给定时间正在考虑的所有阶梯。最初,LADDERS数组只存储一个阶梯:仅由START W ORD组成的单字阶梯。但是,在开始之后,LADDERS将始终填充我们当时所有可能的阶梯。例如,当我们开始考虑双字阶梯时,LADDERS[0]将是正在考虑的第一个阶梯,LADDERS[0][0]将是该阶梯中使用的第一个词,而LADDERS[0][1]将是该阶梯的第二个词。同样,LADDERS[1]将存储当前正在学习的第二个阶梯,其具体单词将在LADDERS[1][0]LADDERS[1][1]这样的位置找到。当我们准备学习三词阶梯时,LADDERS数组将再次存储相关信息。这一次,LADDERS[0]将是第一个三词阶梯,LADDERS[0][0]LADDERS[0][1]LADDERS[0][2]分别存储其第一个、第二个和第三个单词。

To see this process in action, let’s code it. We can use the list WORDS as our word bank, and we can define the function BREADTHFIRSTSEARCH() to take in the two necessary inputs: a STARTWORD and an ENDWORD. Inside the function, we can maintain an array called LADDERS where we will store all the ladders under consideration at any given time. Initially, the LADDERS array will store just one ladder: the one-word ladder consisting only of STARTWORD. After we start, however, LADDERS will always be populated with all of our then-current ladder possibilities. When we start to consider two-word ladders, for instance, LADDERS[0] will be the first ladder under consideration and LADDERS[0][0] will be the first word used in that ladder, while LADDERS[0][1] will be that ladder’s second word. Likewise, LADDERS[1] will store the second ladder currently under review and its specific words will be found at positions like LADDERS[1][0] and LADDERS[1][1]. When we are ready to think about three-word ladders, the LADDERS array will again store the relevant information. This time, LADDERS[0] will be the first three-word ladder, with LADDERS[0][0], LADDERS[0][1], and LADDERS[0][2] storing its first, second, and third words, respectively.

函数的主循环应该重复,直到找到解决方案或计算机用完潜在的阶梯。在每次遍历开始时,变量LADDER T O E XPAND通过数字标识我们希望构建的现有阶梯。然后,代码循环遍历WORDS中的所有单词,排除任何已经在感兴趣的阶梯中使用过的单词,并且对于剩余的每个允许的单词,创建一个由原始阶梯加上新的连接词组成的新阶梯。这些新的阶梯被添加到LADDERS数组的末尾;原始的、较短的阶梯从数组中移除; LADDERS中原来的第二个阶梯现在移到第一个位置,从而成为下一轮要分析的阶梯。示例代码如下所示。

The function’s main loop should repeat until either a solution is found or the computer runs out of potential ladders. At the start of each pass, the variable LADDERTOEXPAND identifies by number the existing ladder on which we are hoping to build. The code then loops through all the words in WORDS, excludes any that have already been used in the ladder of interest, and, for each remaining permissible word, creates a new ladder that is made up of the original ladder plus that new connecting word. These new ladders are added to the end of the LADDERS array; the original, shorter ladder is removed from the array; and what had been the second ladder in LADDERS now moves into that first spot and hence becomes the ladder that will be analyzed in the next pass. Sample code appears below.

现在来看看令人惊讶的转折:要将函数改为深度优先搜索而不是广度优先搜索,我们只需修改一行代码。具体来说,在每一轮中,我们不是扩展第一个列出的梯子,而是扩展最后一个列出的梯子,忽略其余的梯子,直到找到一条获胜路径或用尽所有允许的单词。

Now for the surprising twist: To change the function such that it conducts depth-first instead of breadth-first search, we would need to change only one line of code. Specifically, in each round, instead of expanding the first-listed ladder, we would expand the last-listed ladder, ignoring the rest, until we either found a winning path or ran out of permissible words.

想想看,这会如何运作。当最后一个梯子由两个单词组成时,我们会通过将这个特定的双词起始词之后的所有可能的三词梯子添加到队列末尾来扩展这个梯子。然后,我们不再像广度优先实现那样扩展其他的双词起始词,而是转向最后添加的三词梯子并扩展它,这次我们查找看看是否有任何四字选项可行。之后,同样的操作:我们忽略队列中已有的两字、三字和四字阶梯,只扩展最后一个阶梯,也就是我们最近添加的四字选项。然后我们继续。事实上,只有在最后一个阶梯完全扩展但未成功后,其他阶梯才会受到关注。如果发生这种情况,我们会移除无效的阶梯,并重新开始这个过程,重点关注最后一个新列出的选项。

Think about how that would work. At a moment when the last-listed ladder consists of two words, we would expand that ladder by adding to the end of the queue all the possible three-word ladders that follow from that specific two-word starter. Then, instead of expanding some other two-word starter like we would in a breadth-first implementation, we would turn to the last-added three-word ladder and expand that one, this time looking to see whether any four-word possibilities might work. From there, same thing: we ignore the two-word, three-word, and four-word ladders already in the queue and expand only the last ladder, which is our most recently added four-word option. And we keep going. In fact, other ladders receive attention only after the last-listed ladder has been fully but unsuccessfully expanded. If that happens, we remove the dud and start the process again by focusing on the new last-listed alternative.

这个广度转深度的实现如下所示。同样,代码与我们为广度优先搜索编写的代码几乎完全相同;唯一显著的变化是,我们现在扩展的是LADDERS数组中的最后一个阶梯,而不是第一个。

This breadth-turned-depth implementation is shown below. Again, the code is almost identical to what we wrote for breadth-first search; the only significant change is that we now expand the last ladder in the LADDERS array rather than the first one.

使用广度优先搜索还是深度优先搜索取决于搜索的目标和上下文。对于本章的“字梯”游戏来说,选择显而易见,因为梯子的长度在这种环境下非常重要。但像数独、八皇后、Boggle 和 Scrabble 这样的游戏则提出了更棘手的问题。例如,考虑到我们的目标是找到所有解,而不是某个特殊的解,在八皇后游戏中,我们是否有理由选择广度优先搜索或深度优先搜索?同样,如果数独棋盘上有 43 个空位,那么这两种搜索策略本质上可以互换,因为每个解决方案都必然包含四十三个部分?简而言之,下次你在深度优先搜索和广度优先搜索之间做出选择时,应该考虑哪些问题?

The decision of whether to use breadth-first or depth-first search depends on the goals and context of the search. For this chapter’s Word Ladder game, the choice was clear because ladder length is such an important consideration in that setting. But games like sudoku, Eight Queens, Boggle, and Scrabble raise more difficult questions. For example, is there any reason to prefer either breadth-first or depth-first search as applied to Eight Queens given that our goal there is to find every solution, not just some special one? Similarly, if there are forty-three empty spaces on a sudoku board, are these two search strategies in essence interchangeable because every solution will inevitably have forty-three parts? In short, what issues should be on your mind the next time you are considering the choice between depth-first and breadth-first search?

章节挑战

Chapter Challenge

回头看看我们从boldhope构建阶梯的例子。在分析的某一时刻,我们已经找到了两个单词的阶梯bold - told,并且我们添加了潜在的三个单词的阶梯bold - hold - told。这似乎效率低下;如果我们想到达told,我们已经有一条更短的路径:bold - told。此外,如果获胜路径不经过told,那么两个单词的阶梯和三个单词的阶梯都无关紧要。鉴于此,我们是否应该调整广度优先代码,使计算机消除看似多余的阶梯(例如bold - hold - told) ,而不是存储和扩展它们?会不会有一天,删除像bold - hold - told 这样的列表会阻止计算机找到最佳路径?测试不同的例子,看看你的想法。

Look back at the example where we built a ladder from bold to hope. At one point in the analysis, we had already found the two-word ladder bold-told, and we added the potential three-word ladder bold-hold-told. That seems inefficient; if we want to get to told, we already have a shorter path: bold-told. Moreover, if the winning path does not go through told, then neither the two-word nor the three-word ladder will matter anyway. Given that, should we adjust our breadth-first code such that the computer eliminates seemingly redundant ladders like bold-hold-told rather than storing and expanding them? Will there ever be a time when deleting a list like bold-hold-told stops the computer from finding the optimal path? Test different examples and see what you think.

回合制策略游戏

Turn-Based Strategy Games

4

4

到底轮到谁了?

Whose Turn Is It Anyway?

尼姆棋是一款经典的策略游戏,通常使用硬币进行。游戏开始时,会提供一定数量的硬币,第一位玩家可以选择拿取其中的一枚或两枚。然后轮到第二位玩家,他也可以同样选择拿取一枚或两枚硬币。游戏循环进行,直到两位玩家中的一位拿取最后一枚硬币,赢得游戏。假设你开始玩尼姆棋,游戏中有七枚硬币,你先走。你第一步应该拿取多少枚硬币?更重要的是,你怎么知道呢?

Nim is a classic strategy game typically played using coins. At the start, some number of coins are made available and the first player is offered the chance to take either one or two of them. Then it is the second player’s turn, and that player likewise can choose to take either one or two coins. Play goes back and forth until one of the two players takes the last coin, winning the game. Suppose you are starting a game of Nim where there are seven coins and you go first. How many coins should you take on your very first move? And, more importantly, how do you know?

对你来说最好的举动取决于你认为你的朋友会如何应对。比如,如果你拿两枚硬币,你的朋友会看到剩余五枚硬币,他会拿走在只有五枚硬币的世界中最合理的硬币数量。相反,如果你拿一枚硬币,你的朋友会看到剩余六枚硬币,他会根据桌上还有六枚硬币的情况做出最佳举动。当然,你朋友的举动取决于他认为你再次轮到你时会如何行动。所以,如果你拿两枚硬币,而你的朋友正在考虑拿两枚硬币,那么你的朋友就必须考虑接下来会发生什么,当再次轮到你时,你看到剩下的三枚硬币。或者,如果你拿两枚硬币,而你的朋友正在考虑拿一枚硬币,那么你的朋友就必须考虑当轮到你并且剩余四枚硬币时会发生什么。

The best move for you will depend on whatever you think your friend will do in response. If you take two coins, for instance, your friend will see five coins remaining and will take whatever number of coins makes the most sense in a world with just five coins. If you take one coin, by contrast, your friend will see six coins remaining and will make whatever move is best in light of there being six coins on the table. Of course, your friend’s move will depend on what your friend thinks you will do when it is again your turn. So if you take two coins and your friend is thinking about taking two coins, your friend has to consider what will happen next, when it is your turn once more, and you see those three remaining coins. In the alternative, if you take two coins and your friend is thinking about taking one coin, your friend has to consider what will happen when it is your turn and there are four coins left.

认出这个模式了吗?这里的分析是递归的。我的最佳举动取决于你根据我的举动做出的最佳举动,而我的举动又取决于我的最佳行动取决于你对我最初行动做出的最佳回应,而你的最佳行动又取决于你对我最初行动做出的最佳回应,以及你对我最初行动做出的最佳回应,如此反复,直至完成所有可能的行动。这本质上又回到了我们之前提到的冰淇淋的例子,我前面排队的人数取决于我前面排队的人的人数。

Recognize the pattern? The analysis here is recursive. My best move depends on your best move in light of my move, which depends on my best move in light of your best move in response to my original move, which depends on your best move in light of my best move in response to your best move in response to my original move, and so on, through the full series of possible moves. This is in essence our ice cream example all over again, where the number of people in front of me in line depends on the number of people in front of the person directly in front of me in that same line.

然而,一个复杂性在于,此处的任何递归函数都必须改变视角。一开始,该函数从第一个玩家的角度寻找最佳走法。但要做到这一点,该函数必须考虑从第二个玩家的角度做出的最佳响应走法,之后还要考虑从第一个玩家的角度做出的最佳后续响应。因此,与我们在第二章中讨论的递归函数不同,此处的递归函数必须仔细追踪待处理函数的目标是为你赢得胜利(因为代码此时正在寻找你的最佳走法),还是为我赢得胜利(因为代码此时正处于分析的某个阶段,目标是预测我对你之前某个走法做出的最佳响应)。

One complexity, however, is that any recursive function here must change perspective. At the start, the function is looking for the best move from the first player’s perspective. But, to do that, the function must consider the best responsive move from the second player’s perspective, and, later still, the best further response from the first player’s perspective. Thus, unlike the recursive functions we considered in chapter 2, the recursive functions here will have to carefully track whether the goal of the pending function is to pull out a win for you, as the code is at that moment looking for your best move, or a win for me, as the code is instead at a stage in the analysis where the goal is to anticipate my best response to some prior move you made.

在代码中捕获所有这些的最简单方法是将工作分解为两个相关的函数。将第一个函数称为WHAT H APPENS()。我们将在下面编写此函数,但现在让我们计划它将可用硬币数量和建议移除的硬币数量作为输入,然后它将返回移除该数量的硬币是否会导致获胜或失败的答案作为输出。将第二个函数称为FIND B EST M OVE()。它将可用硬币数量作为输入并返回两条信息作为输出:计算机认为是最佳选择的举动,以及从WHAT H APPENS()中获得的关于最佳举动是否会导致获胜的信息。

The easiest way to capture all that in code is to break the work into two related functions. Call the first one WHATHAPPENS(). We will write this function below, but for now let’s plan that it will take as input the number of coins available and a proposed number of coins to remove, and then it will return as output the answer to whether taking that number of coins will lead to a win or a loss. Call the second function FINDBESTMOVE(). This one will take as input the number of coins available and return as output two pieces of information: the move that the computer thinks is the best option, and the information derived from WHATHAPPENS() about whether that best move will lead to a win.

现在我们可以编写FIND B EST M OVE() 了。该函数首先调用WHAT H APPENS(),并建议只拿一枚硬币。接下来,该函数再次调用WHAT H APPENS(),这次建议拿两枚硬币。如果拿一枚硬币可以获胜,该函数将返回拿一枚硬币的建议,并预测此举将获胜。如果不是,该函数将建议另一个唯一可用的选项,即拿走两枚硬币,并返回它预期的结果。

Now we can write FINDBESTMOVE(). The function starts by calling WHATHAPPENS() with a proposed move of taking just one coin. Next, the function calls WHATHAPPENS() again, this time with a proposed move of taking two coins. If taking one coin leads to a win, the function returns a suggestion to take one coin, and it returns the prediction that this move will be a winning move. If not, the function suggests the only other option available, namely removing two coins, and it reports back whatever outcome it expects.

现在转到WHAT H APPENS() 函数。记住,这个函数的输入是剩余可用硬币总数以及建议的移动方式。它会报告提议的走法是否会导致移动玩家获胜。为此,该函数首先需要通过创建一个新游戏,并在这个新的假设游戏中实际移除所有硬币来测试提议的走法。如果剩余硬币为零,则提议的走法实际上是一个获胜的走法,因此函数可以认可它。然而,如果不是,该函数需要执行我们计划的递归,具体来说,通过调用FIND B EST M OVE()来确定在提议的硬币被移除后下一个玩家的最佳走法。

Turn now to WHATHAPPENS(). Remember, this function takes as input both the total number of coins still available and also the proposed move, and it reports back whether the proposed move will lead to a win for the moving player. To do that, the function first needs to test the proposed move by creating a new game and actually removing those coins in the context of that new, hypothetical game. If zero coins remain, the proposed move is in fact a wining move, so the function can endorse it. If not, however, the function needs to implement our planned recursion, specifically by calling FINDBESTMOVE() to determine the best move for the next player after the proposed coins have been taken.

这里的返回才是棘手的部分。如果对手的最佳回应导致我获胜,我们需要将其报告为失败,因为对手获胜对我来说就是失败。相反,如果对手失败,我们需要将其报告为胜利,原因与此相反。实际的 Python 代码可能如下所示。

The return here is then the tricky part. If the opponent’s best response leads to a win, we need to report that back as a loss, since a win for my opponent is a loss for me. Conversely, if the opponent loses, we need to report that as a win for the inverse reason. Working Python code might look like what’s given below.

这个编码框架可以应用于各种各样的双人游戏。也就是说,各种各样的双人游戏都遵循这样的模式:我的最佳走法取决于你的最佳走法,而你的最佳走法又取决于我的下一个最佳走法,依此类推,直到出现某个稳定的非递归情况。在这些游戏中,计算机可以使用诸如FIND B EST M OVE()WHAT H APPENS()之类的递归函数来推理各种排列组合。

This coding framework can be applied to a wide range of two-player games. That is, a wide range of two-player games follow a pattern where my best move depends on your best move, your best move depends in turn on my next best move, and so on down the line until some stable, nonrecursive case. And in those games, a computer can reason through the various permutations by using recursive functions like FINDBESTMOVE() and WHATHAPPENS().

不相信?试试井字游戏。下图所示的决策树描述了人类玩家 ( P ) 和计算机 ( C ) 之间正在进行的游戏。图最上方的棋盘总结了目前的游戏情况:计算机占据了三个格子,人类玩家占据了三个格子,并且双方都没有三格连成一线。在棋盘下方,该图显示了计算机下一步行动的三个选项:计算机可以占据中间右侧、右下角或左下角的格子。该图接着显示了人类玩家在每种可能的计算机移动之后可以做出的响应。例如,如果计算机占据了中间右侧的格子,那么人类玩家可以占据左下角或右下角。相反,如果计算机占据了右下角的格子,那么人类玩家可以占据左下角或中间右侧的格子。所有可能的移动顺序最终都在图中表示出来,每条路径的最终结果要么是有人赢得游戏,要么是棋盘被填满。

Don’t believe me? Try tic-tac-toe. The decision tree shown below depicts a game in progress between a human player (P) and the computer (C). The board at the very top of the figure summarizes the game so far: the computer has taken three spaces, the human player has taken three spaces, and neither of the two has three in a row. Below that board, the figure shows the three options available to the computer for its next move: the computer can take the middle right, bottom right, or bottom left space. The figure continues from there by showing the responses available to the human player after each possible computer move. If the computer takes the middle right space, for example, the human player can take either the bottom left or bottom right. If the computer takes the lower right space, by contrast, the human player can take either the lower left or middle right. Every possible sequence of moves is ultimately represented in the drawing, with each path ending either with someone winning the game or with the board being completely full.

这棵决策树描绘了正在进行的井字棋游戏中所有可能的走法。最上面的棋盘显示了游戏的当前状态。接下来的三块棋盘显示了计算机可用的三种走法。接下来的四块棋盘记录了人类玩家可能做出的回应。最后,最下面的两块棋盘代表了计算机可能做出的进一步回应。

This decision tree depicts all the possible moves that remain in an in-progress game of tic-tac-toe. The top board shows the game in its current state. The next three boards show the three moves available to the computer. The next four capture possible responses from the human player. Finally, the bottom two represent the computer’s possible further responses.

计算机应该像我们在硬币游戏中确定最佳走法一样,确定它的第一步走法。非递归情况有两种:玩家连续三步,或者游戏以平局结束。而递归情况则需要测试每个空位,并判断选择该空位最终是会让被调用函数的玩家获胜,还是会让对方玩家获胜,还是游戏平局。

The computer should determine its first move in the same way we determined the best move in the coin game. There are two nonrecursive cases: a player achieves three in a row, or the game ends in a tie. The recursive case, meanwhile, needs to test every open space and ask whether choosing that space will ultimately yield a win for the player from whose perspective the function was called, a win for the opposing player, or a tie game.

让我们实际分析一下这个例子。下图是上面决策树左侧的放大图。在这里,计算机在中间右侧的空间放置标记,这给人类玩家留下了两个选择:在左下角的空间放置标记赢得游戏,或者在右下角放置标记,为失败埋下伏笔。人类玩家将明确选择获胜,这意味着计算机不应该遵循原始决策树中的这个分支。

Let’s actually work through this example. The next graphic is a zoomed-in look at the left side of the above decision tree. Here, the computer puts its mark in the middle right space, which leaves the human player with two choices: win the game by placing a mark in the bottom left space, or place a mark in the bottom right and set the stage for a loss. The human player will clearly choose to win, which means the computer should not follow this branch in the original decision tree.

计算机接下来会考虑树的中间分支,它会在右下角的位置放置标记。占据这个位置后,人类玩家有两个选择:在左下角的位置放置标记立即获胜,或者在中间右侧的位置放置标记并最终输掉游戏。人类玩家会再次争取胜利,因此这个分支对计算机来说也是一个不具吸引力的选择。占据右下角,人类玩家获胜。这不太好。

The computer will next consider the middle branch of the tree, where it places its mark in the bottom right space. Take that spot and the human player has two choices: win immediately by placing a mark in the lower left space, or place a mark in the middle right space and ultimately lose the game. The human player will once more go for the win and thus this branch, too, is an unattractive option for the computer. Take bottom right and the human player wins the game. Not good.

剩下的就是第三个分支。如果计算机将标记放在左下方空间,计算机立即获胜。无需递归步骤。此举必胜,因此从计算机的角度来看是最优选择。计算机应该选择左下方。

That leaves the third branch. If the computer places its mark in the lower left space, the computer immediately wins. There is no need for a recursive step. This move guarantees a win and is therefore the optimal choice from the computer’s perspective. The computer should take lower left.

我们的递归井字游戏分析可以使用我们为硬币游戏开发的相同函数自动化,但需要针对特定​​游戏进行一些细微的调整。从FIND B EST M OVE()开始。对于硬币游戏,该函数将剩余硬币数量作为输入,并返回可取硬币数量以及此举是否会导致获胜的预测作为输出。对于井字游戏,输入需要稍微复杂一些,因为它不仅需要包含有关游戏板上九个空格的信息,还需要指示建议的移动是针对人类玩家还是计算机。九个空格是必需的,因为如果没有看到完整的棋盘。玩家参数是必需的,因为计算机需要知道是否在棋盘上添加PC。

Our recursive tic-tac-toe analysis can be automated using the same functions we developed for the coin game, but with some minor game-specific adjustments. Start with FINDBESTMOVE(). For the coin game, that function took as input the number of coins remaining and returned as output both the number of coins to take and a prediction as to whether that move would lead to a win. For tic-tac-toe, the input needs to be slightly more complicated, in that it needs to include not only information about the nine spaces on the gameboard but also an indicator as to whether the proposed move is being made for the human player or the computer. The nine spaces are necessary because the computer cannot evaluate the game without seeing the full gameboard. And the player parameter is necessary because the computer needs to know whether to add a P or a C to the board.

这次FIND B EST M OVE()的结构也稍微复杂一些。上次我们做得比较简单:我们测试了取一枚硬币的可能性,如果取不到钱,我们可以假设取两枚硬币是更好的选择,或者至少不会更糟。这使得函数变得非常精简。这次,我们必须测试每个空格,跟踪结果,并在最后报告我们的常规输出:所有选项中的最佳走法,以及我们对该走法会导致赢、输还是平局的预测。

The structure of FINDBESTMOVE() is also slightly more complicated this time. Last time, we had it easy: we tested the possibility of taking one coin and, if that did not lead to a win, we were able to assume that taking two coins was either a better choice or, at least, no worse. That allowed for a very streamlined function. This time, we have to test every empty space, keep track of the results, and at the end report back our normal output: the best move out of all the options along with our prediction as to whether that move will lead to a win, a loss, or, a new wrinkle here, a tie.

示例 Python 代码如下所示。函数中间的循环会考虑每个空格,检查其是否为空;如果为空,则测试如果该玩家在该空格中放置标记会发生什么。其余代码会跟踪迄今为止的最佳选择。具体来说,变量CURRENT B EST M OVE存储迄今为止最佳走法的行号和列号,变量CURRENT B EST O UTCOME指示该走法是输、赢还是平局。这两个变量最初都设置为无意义的值 -99,这些值将立即被计算机测试的第一个走法所对应的真实值替换。之后,如果后续空格的结果更佳,计算机将再次更新这些变量,以记录新的、更优的行/列对以及新的、更优的赢/输/平局结果。

Sample Python code is shown below. The loop in the middle of the function considers each space, checks to see if it is empty, and, if so, tests to see what would happen if this player put its mark in that space. The rest of the code then keeps track of the best option thus far. Specifically, the variable CURRENTBESTMOVE stores the row number and column number of the best-so-far move, and the variable CURRENTBESTOUTCOME indicates whether that move was a loss, a win, or a tie. Both variables are initially set to nonsense values of −99, which will immediately be replaced by the real values associated with the first move the computer tests. After that, if a later space turns out to deliver an even better outcome, the computer will update these variables again to record the new, better row/column pair and the new, better win/loss/tie outcome.

与此同时,我们的WHAT H APPENS()函数与我们在硬币游戏中使用的同名函数几乎完全相同。输入是当前棋盘、建议的行/列以及玩家身份。输出是预期的游戏结果,要么赢,要么输,要么平局。请注意,该函数调用了两个游戏专用函数,这两个函数未在摘录中显示,但编写起来很简单。一个是SCORE B OARD()函数,它将棋盘作为输入,并检查 (a) 任一玩家是否连续三张牌,或 (b) 没有剩余空格,因此游戏是平局。另一个新函数GENERATE T EST B OARD()会生成一个新的棋盘,计算机可以在其上测试走法,而不会意外更改实际棋盘。在硬币游戏中,我们不需要这样的函数,因为在那个游戏中,“棋盘”只是一个数字。相比之下,这里的游戏板是一个由九个空间组成的数组,因此评估它( SCORE B OARD)和复制它(GENERATE T EST B OARD )需要做更多的工作。

Our WHATHAPPENS() function, meanwhile, turns out to be almost identical to the namesake function we used in the coin game. The inputs are the current board, the proposed row/column to play, and the identity of the player. The output is the expected game result, which will be either a win, a loss, or a tie. Note that the function calls two game-specific functions that are not shown in the excerpt but are straightforward to code. One is a SCOREBOARD() function that takes the gameboard as input and checks to see if (a) either player has three in a row or (b) there are no spaces left and hence the game is a draw. The other new function, GENERATETESTBOARD(), generates a new gameboard on which the computer can test moves without accidentally changing the real board. We did not need functions like these in the coin game because, in that game, the “board” was just a single number. Here, by contrast, the gameboard is an array of nine spaces, so it takes a little more work both to evaluate it (SCOREBOARD) and to duplicate it (GENERATETESTBOARD).

计算机科学家将这种递归策略称为“极小极大”(minimax)。这个名字暗示了这样一个事实:当函数从玩家的角度运行时,目标是最大化结果;而当函数从对手的角度运行时,目标是最小化结果。您可以通过此代码链接查看该算法的运行情况。我不仅在其中提供了极小极大井字游戏的完整 Python 实现,还提供了调试代码,让您可以准确地看到计算机在玩游戏时的操作。

Computer scientists call this recursive strategy minimax. That name is a nod to the fact that when the function is working from the player’s perspective, the goal is to maximize the outcome, but when the function is working from the opponent’s perspective, the goal is to minimize the outcome. You can see the algorithm in action by following this CodeLink. There, I put not only a full working Python implementation of minimax tic-tac-toe but also some debug code that will allow you to see exactly what the computer is doing as it plays the game.

极小极大算法最大的缺点是计算量巨大。例如,如果让计算机使用极小极大算法来寻找井字棋游戏中的第一步,计算机在做出选择之前会生成数十万个实验棋盘。正如我们将在下一章探讨的那样,有一些方法可以提高算法的效率,但也存在局限性。测试每一种可行的走法和响应模式必然需要时间。本书后面会讨论其他双人游戏算法,这些算法虽然不那么全面,但速度可能要快得多。

The biggest drawback to minimax is that it can be computationally intense. Ask the computer to use minimax to find the first move in a game of tic-tac-toe, for example, and the computer will generate hundreds of thousands of experimental gameboards before making its pick. As we will explore in the next chapter, there are ways to improve the algorithm’s efficiency, but there are limits. Inevitably, testing every plausible pattern of move and response is going to take time. Later in the book, we will consider other two-player algorithms that are less comprehensive but potentially much faster.

章节挑战

Chapter Challenge

在四子棋游戏中,两名玩家轮流将棋子放入七列六行的垂直网格中。棋子落入其所在列的最低空位。每位玩家的目标是率先将四枚棋子在垂直、水平或任意直线对角线上“连接”起来。

In Connect Four, two players take turns dropping checkers into a seven-column, six-row vertical grid. Dropped pieces fall to the lowest empty space in the column into which they are dropped. Each player’s objective is to be first to “connect” four of their checkers vertically, horizontally, or along any straight-line diagonal.

最上面的两块棋盘展示了游戏的基本机制:如果计算机将棋子落到第三列,棋子将落到该列最低的可用位置。其余三块棋盘依次显示了水平、垂直和对角线的获胜方式。

The top two boards demonstrate the basic mechanics of the game: if the computer drops a checker into the third column, it falls to the lowest available space in that column. The remaining three boards show, in order, a horizonal, a vertical, and a diagonal win.

本章的挑战是编写一个程序,让计算机使用极小极大原理与人类玩家玩四子棋。和往常一样,我的CodeLink会帮助你入门,提供创建棋盘、放置棋子以及判断任一玩家是否拥有四子连珠所需的函数。请注意,我的代码会加入一个正在进行的游戏。我这样做是因为如果你的代码必须从第一步开始分析游戏,速度会非常慢。话虽如此,一旦你有了可以运行的代码,从我原来的棋盘上移除棋子……看看会发生什么。有什么想法可以加快速度吗?

Your challenge for this chapter is to write a program in which the computer uses minimax to play Connect Four against a human player. As always, my CodeLink will get you started, giving you the functions you need to create the board, drop checkers, and determine whether either player has four in a row. And note that my code joins a game already in progress. I did this because your code would be painfully slow if it had to analyze the game from the very first move. That said, once you have working code, remove pieces from my original gameboard and . . . see what happens. Any ideas on how to speed things up?

5

5

行动更快

Move Faster

在上一章的结尾,我挑战你完成我的四子棋代码。你的版本很可能在我示例代码中构建的半成品棋盘上运行良好,但当你尝试让计算机从接近空的棋盘开始玩新游戏时,速度却慢得令人难以忍受。正如你肯定怀疑的那样,这里的问题在于极小极大算法本身。

At the end of the previous chapter, I challenged you to finish implementing my Connect Four code. Most likely, your version ended up performing well on the half-played gameboard I built into the sample code but was unworkably slow when you tried to have the computer play a new game starting from a closer-to-empty board. As you surely suspect, the problem here is minimax itself.

当计算机使用标准极小最大算法时,它会在每个可能的棋盘上尝试所有可能的走法。对于我部分玩过的四子棋示例来说,这没有任何问题。因为很多空格已经被填满,所以计算机只需要测试大约 200 个棋盘就能选择第一步走法,而且随着越来越多的空格被填满,它需要测试的棋盘就更少了。这很有效。对于一个空的井字棋棋盘来说,数字要大一些,但仍然可以忍受,第一步走法大约需要接近 550,000 个棋盘。对于计算机来说,这是可以忍受的。但对于一个完全空的四子棋棋盘,极小最大算法要求计算机在开始走法之前评估数万亿个棋盘。那么我们能做什么呢?

When a computer uses the standard minimax algorithm, it plays out every possible move on every possible gameboard. For my partially played Connect Four example, that posed no problem. Because so many spaces were already filled in, the computer only needed to test roughly 200 boards in order to pick its first move, and it needed to test even fewer boards as more and more spaces were taken. That worked fine. For an empty tic-tac-toe board, the numbers are bigger but still tolerable, clocking in at something close to 550,000 boards for the first move. For a computer, that’s manageable. But for a completely empty Connect Four gameboard, minimax would require the computer to evaluate trillions and trillions of gameboards before making its opening drop. So what can we do?

一种选择是限制计算机预测的距离。玩四子棋时,你可能一次最多计划两三步。你正在考虑把棋子放在第三列,也许,然后你想,“如果我在第 3 列落子,对手就会在第 5 列阻挡或在第 2 列筑巢,这样我就有机会在第 2 列或第 7 列加一个有用的棋子。”在这个例子中,你提前考虑了两步:在考虑你的举动时,你也在考虑对手的回应以及你对对手回应的回应。但重要的是,你并没有考虑你对他们回应的回应、你对他们回应的回应、你对他们回应的回应、你对他们回应的回应。因此,你的举动可能不是最优的,但是,通过提前考虑几步,你可能会在相当短的时间内做出相当不错的选择。

One option is to limit how far ahead the computer looks. When you play Connect Four, you probably plan at most two or three moves at a time. You are considering dropping your checker in column 3, maybe, and you think, “If I drop my checker in column 3, my opponent will respond by blocking in column 5 or building in column 2, which will then give me the chance to add a helpful checker in columns 2 or 7.” In that example, you are thinking two moves ahead: when considering your move, you are also thinking about your opponent’s response and your response to your opponent’s response. Importantly, however, you are not thinking about your response to their response to your response to their response to your response to their response to your response to their response. As a result, your move might not turn out to be optimal, but, by thinking a few moves ahead, you probably make reasonably good choices in a reasonably short amount of time.

我们可以调整代码,以类似的方式限制计算机,使其只提前思考几步。下面,我们将首先添加一个计数器,用于跟踪已触发的递归调用次数,然后添加代码,当达到最大次数时停止递归。然而,在采取这些步骤之前,我们必须做一些富有创意的事情:我们必须弄清楚如何解读已进行到一半的棋盘。

We can adjust our code to similarly restrict the computer so that it thinks just a few moves ahead. Below, we will do that by first adding a counter that keeps track of how many recursive calls have been triggered and then adding code that stops the recursion whenever some maximum number is reached. Before we take those steps, however, we have to do something a little creative: we have to figure out how we are going to interpret partially played gameboards.

问题就在这里。到目前为止,我们的WHAT H APPENS()函数总是预测其中一位玩家获胜或平局。没有其他可能的结果;每一步都会导致递归过程,只有在计算机获胜、计算机的对手获胜或棋盘被填满从而陷入僵局时才会结束。WHAT H APPENS ()函数不需要考虑任何其他可能性,因为实际上没有其他可能的结果。递归过程一直进行到每局游戏结束。

Here’s the issue. Up until now, our WHATHAPPENS() function has always predicted either a win for one of the players or a tie. There were no other possible outcomes; every move led to a recursive process that would only end after either the computer won, the computer’s opponent won, or the board was filled and hence deadlocked. The WHATHAPPENS() function did not need to consider any other possibility because no other outcome was in fact possible. The recursion played until the end of every game.

相比之下,在本章中,我们有意在短时间内停止。也就是说,我们限制了递归,这意味着计算机可能不得不在既没有胜负,也没有平局,而只是在某个中间棋盘上已经下了一定数量的棋子时停止进程。尽管如此,计算机必须对该棋盘进行评分,提供一些特征,以便将该棋盘与其他可能的未来棋盘进行比较。本质上,计算机现在需要做出游戏尚未结束的判断,例如“如果我走这一步,我未来获胜的机会看起来非常大”,或者“如果我的对手在那里落下一个棋子,他们将占据绝对优势,所以我需要尽可能避免这种模式。”

In this chapter, by contrast, we are intentionally stopping short. That is, we are limiting recursion, which means that the computer might have to stop the process at a time when there is neither a win nor a tie but instead just some intermediate gameboard with some number of pieces already played. The computer nevertheless must score that board, offering some characterization that will allow comparisons between that board and other possible future boards. In essence, the computer now needs to make the-game-is-not-over-yet judgments like “If I make this move, my chances for a future win look really good,” or “If my opponent drops a checker there, they will have a commanding lead, so I need to avoid that pattern if possible.”

这项工作可以在我们的标准SCORE B OARD()函数中完成。开头的几行代码很容易画出来。如果在计算机有四个棋子连成一线的棋盘上调用SCORE B OARD()函数,该函数可以返回+1000的分数,这是一个任意高的数字,我们可以用它来表示计算机获胜。相反,如果在计算机对手有四个棋子连成一线的棋盘上调用SCORE B OARD(),该函数可以返回 -1000 的分数,同样这是一个任意低的数字,我们可以用它来表示对方获胜。如果在没有空位的棋盘上调用SCORE B OARD(),该函数可以返回 0 的分数,因为平局可以理解为计算机获胜(我们的大正数)与其对手获胜(我们的大负数)之间的中点,这使得 0 成为一个有吸引力的选择。

This work can be done inside our standard SCOREBOARD() function. The opening few lines of code are easily sketched. If the SCOREBOARD() function is called on a board where the computer has four checkers in a row, the function can return a score of +1000, which is just an arbitrarily high number that we can use to signify a win for the computer. Conversely, if SCOREBOARD() is called on a board where the computer’s rival has four checkers in a row, the function can return a score of –1000, which is similarly an arbitrarily low number that we can use to signify a win for that other player. If SCOREBOARD() is called on a board that has no empty spaces, meanwhile, the function can return a score of 0 because a tie is an outcome that can be conceptualized as the midpoint between the computer winning (our big positive number) and its rival winning (our big negative number), which makes 0 an attractive choice.

但其他情况又如何呢?例如,考虑一下下图左侧所示的第一个棋盘。如果计算机在棋盘呈现如下图所示的状态时停止递归,它应该如何报告得分?这个棋盘是否对计算机有利,因此得分应该高于 0,但低于“好消息,计算机肯定会赢”的 +1000 这个棋盘是否对人类玩家有利,因此得分应该低于 0,但高于“真不敢相信,人类会赢”的 -1000?我们究竟应该用什么数字来描述这个棋盘,以便能够将其与其他可能正在进行的棋盘(例如右侧所示的两个示例棋盘)进行仔细的比较?

But what about everything else? Consider, for instance, the first gameboard shown on the left below. If the computer stops its recursion at a time when the board looks like this, how should the computer report the score? Is this a board that favors the computer and so should have a score above 0 but below the “great news, the computer definitely wins” number of +1000? Is this a board that favors the human player and so should have a score below 0 but above the “I can’t believe it; the human will win” number of −1000? And what number, exactly, should we use to characterize this board such that it can be thoughtfully compared to other possible in-progress gameboards like the two sample boards shown to the right?

计算机需要对这样的中间棋盘进行评分,为每个棋盘分配一个数字,以便将其与树中其他可能的游戏棋盘进行比较。

The computer needs to score intermediate boards like these, assigning each board a number that can be used to compare it to other possible gameboards in the tree.

事实证明,对正在进行的四子棋盘游戏的得分没有“正确”或“错误”的方法。但这里有一个简单的方法,仅供参考。每当玩家有三个棋子连成一线,并且还有第四个空位可用于未来可能获胜时,我们就给十分。每当玩家有两个棋子连成一线,并且还有两个空位可用于未来可能获胜时,我们就给五分。最后,我们为玩家在第4列(即中间列,因此在战略上往往特别有价值)的每个棋子奖励三分。然后,游戏盘的得分可以通过将计算机在该棋盘上获得的总分减去其对手在同一棋盘上获得的总分来计算。因此,正数表示棋盘对计算机来说比对手看起来更好,而负数则表示棋盘倾向于相反的方向。下一页将有一个示例。

It turns out that there is no “right” or “wrong” way to score an in-progress Connect Four gameboard. But here’s one simple approach, just to get us started. Let’s award ten points every time a player has three checkers in a row and there is an empty fourth spot available for a possible future win. Let’s award five points every time a player has two checkers in a row and there are two empty spots available for a possible future win. Lastly, let’s award three points for every checker that a player has in column 4, which is the middle column and hence tends to be particularly valuable strategically. A gameboard’s score can then be calculated by taking the total number of points the computer earns on that board and subtracting the total number of points that its rival earns on that same board. A positive number would thus indicate that the board looks better for the computer than its rival, and a negative number would tell us that the board instead leans the other way. An example follows on the next page.

考虑到所有这些,我们现在可以编写程序的其余部分了。首先从FIND B EST M OVE()函数开始。与往常一样,该函数将当前棋盘和轮到其移动的玩家身份作为输入。在函数内部,我们仍然需要考虑所有可能的移动。上次,这意味着考虑在井字棋盘的任意位置放置XO的可能性;这次,这意味着考虑在七列中的任意一列中放置计算机的棋子或对手的棋子的可能性。最后,当函数执行结束时,我们仍然需要返回我们找到的最佳移动,以及一个指示,表明该移动是导致计算机获胜、对手获胜、平局还是其他结果。

With all that in mind, we can now write the rest of the program. Start with the FINDBESTMOVE() function. As always, the function takes as input both the current gameboard and the identity of the player whose turn it is to move. Inside the function, we still need to consider all the possible moves. Last time, that meant considering the possibility of putting an X or an O in any space on a tic-tac-toe board; this time, it means considering the possibility of dropping either the computer’s checker or the opponent’s checker in any of the seven columns. Lastly, when the function concludes, we still need to return as output both the best move we have found and also an indicator of whether that move leads to a win for the computer, a win for the rival, a tie, or something else.

FIND B EST M OVE()函数唯一显著的变化是,在这个版本中,我们需要引入一个额外的输入:一个计数器,用于跟踪计算机在进行进一步的递归评估时可以考虑的未来走法数量。例如,如果我们想让计算机只考虑两步走法——也就是说,如果我们希望计算机通过观察对手可能的应对以及计算机对该应对的后续行动来选择走法,仅此而已——我们可以将深度参数设置为 2。如果我们想让计算机考虑所有这些因素,以及对手对计算机后续行动的进一步反应,我们可以将深度参数设置为 3。修改后的代码如下所示。

The only significant change to FINDBESTMOVE() is that, in this version, we need to introduce an additional input: a counter that keeps track of how many future moves the computer will be allowed to consider when conducting further recursive evaluations. For instance, if we want the computer to look just two moves ahead—that is, if we want the computer to pick its move by looking at the opponent’s possible response and also the computer’s follow-up to that response, but nothing more—we would set this depth parameter to 2. If we instead want to empower the computer to consider all that plus also the opponent’s further response to the computer’s follow-up response, we would set the depth parameter to 3. Our revised code is given below.

至于WHAT H APPENS(),在以前的版本中,此函数的输入包括棋盘、建议的走法以及计算机应该从计算机的角度还是对手的角度考虑游戏的指示。这次,我们需要输入所有这些信息以及深度信息。为什么?因为此函数也能够进行递归调用,因此此函数也需要知道已经进行了多少次调用,以及是否允许进行更多调用。从操作上讲,这需要对代码进行两处特定的更改。首先,当函数首次被调用时,它需要检查深度计数器是否为 0。如果是,计算机应该停止分析并返回它所知道的关于预测结果的任何信息。其次,每次进行递归调用时,深度计数器都需要减一,这实质上是为了确认通过进行递归调用,计算机正在使用其有限次数的预览之一。

As for WHATHAPPENS(), in prior versions this function has taken as input the gameboard, the proposed move, and an indication as to whether the computer should consider the game from the computer’s perspective or the rival’s perspective. This time, we need to input all of that plus also the depth information. Why? Because this function, too, is capable of making recursive calls, and thus this function, too, needs to know how many calls have already been made and hence whether additional calls are permissible. Operationally, this requires two specific changes to the code. First, when the function is initially called, it needs to check whether the depth counter is at 0. If so, the computer should stop its analysis and return whatever it knows about the predicted outcome. Second, each time a recursive call is made, the depth counter needs to be reduced by one, in essence to acknowledge that by making a recursive call the computer is using one of its limited number of sneak peeks.

章节挑战

Chapter Challenge

这个 CodeLink 将带你了解一个深度受限的四子棋程序的完整实现,包括上面摘录的所有代码以及SCORE B OARD()函数。请暂停一下,测试一下所有代码。使用这种方法,计算机是否会做出非常糟糕的选择?你能否改进我们的评分算法,帮助计算机做出明显更好的选择?当你增加最大递归深度时会发生什么?计算机会做出更好的决策吗?代价是什么?

This CodeLink will take you to a full implementation of a depth-limited Connect Four program, including all of the code excerpted above and also the SCOREBOARD() function. Pause here and test it all. Using this approach, does the computer ever make a painfully bad choice? Can you improve our scoring algorithm in ways that help the computer make noticeably better choices? And what happens when you increase the maximum recursive depth? Does the computer make better decisions? At what cost?

为了真正理解深度限制带来的威力和风险,本章的主要挑战是使用有限的深度重写上一章的井字游戏程序。首先,像之前一样,让计算机运行完整的极小极大算法。当计算机可以自由地进行全面分析时,它的深度是多少?这个问题的答案会随着游戏的进展而改变吗?

To really understand both the power and the peril associated with depth limits, your main challenge for this chapter is to rewrite last chapter’s tic-tac-toe program but using limited depth. Start by allowing the computer to run the full minimax algorithm, just as before. How deep does the computer go when free to pursue the full analysis? Does that answer change as the game progresses?

从这里开始,看看各种限制条件如何影响游戏玩法。如果允许计算机提前四步预测,它还能赢得大多数比赛吗?如果可以,能否在不影响性能的情况下进一步降低深度?如果不是,计算机是倾向于在游戏早期还是后期犯错?也就是说,深度限制在什么时候对计算机的决策最重要?为了验证你的直觉,请编写一个代码版本,让计算机先在不受深度限制的情况下选择自己的走法,然后再在受深度限制的情况下选择自己的走法,并报告这两个选项。这些结果向我们展示了以这种方式限制极小极大值的后果是什么?

From there, see how gameplay is impacted by various constraints. If the computer is allowed to look four moves ahead, does it still win most games? If so, can you cut depth further without undermining performance? If not, does the computer tend to make its mistakes early in the game or late? That is, when do depth limits matter most to the computer’s decision-making? To confirm your intuitions, write a version of the code where the computer chooses its move without any depth limit, then chooses its move subject to a depth limit, and reports back both options. What do those results show us about the consequences of limiting minimax this way?

6

6

修剪树木

Pruning the Tree

到目前为止,我们的四子棋程序严重限制了计算机递归分析的深度,因为尤其是在游戏开始时,极小极大算法的速度会非常慢。这就是为什么我们在标准函数中添加了深度计数器;我们知道深度递归意味着延迟,所以我们用深度换取了速度。但如果我告诉你我们可以做得更好呢?具体来说,如果我告诉你,部分延迟迫使我们限制深度完全是我们的错,因为我们不必要地花时间检查那些不可能成为相关玩家最佳走法的走法,你会怎么想?

So far, our Connect Four program has severely limited the depth of the computer’s recursive analysis because, especially at the beginning of a game, minimax can be incredibly slow. That is why we added a depth counter to our standard functions; we knew that deep recursion meant delay, so we traded depth for speed. But what if I were to tell you that we can do better? Specifically, what if I told you that some of the delay forcing us to limit depth is entirely our fault because we are needlessly spending time checking moves that have no chance of being the relevant player’s best move?

举个例子。假设我们正在进行一局游戏,轮到计算机下棋了,棋盘如下图所示。我们的算法首先假设计算机在第一列丢下一个红色棋子。然后,它会尽职尽责地测试对手所有七种可能的应对方式。也就是说,它会考虑对手在第一列丢下一个响应棋子的可能性。它会考虑对手在第二列丢下一个响应棋子的可能性。它会考虑对手在第三列丢下一个响应棋子的可能性。以此类推,测试所有七列。

Take an example. Suppose we are in the middle of a game, it is the computer’s turn to play, and the gameboard looks as shown below. Our algorithm starts by imagining that the computer drops a red checker in the first column. It then dutifully tests all seven of the rival’s possible responses. That is, it considers the possibility that the rival will drop a responsive checker in the first column. It considers the possibility that the rival will drop a responsive checker in the second column. It considers the possibility that the rival will drop a responsive checker in the third column. And so on, through all seven columns.

但这太浪费了。毕竟,如果计算机在第一列丢下一个红色棋子,而对手又丢下一个响应的黄色棋子,如果在同一第一列中放置一个棋子,对手就赢了。知道这一点后,计算机就不应该再去探究对手在第二、第三、第四、第五、第六或第七列中放置一个响应棋子的可能性。如果计算机选择第一列,对手也可以通过选择第一列来获胜,因此计算机应该停止分析,记录这个糟糕的结果,然后转向下一个相关问题:计算机是否应该将红色棋子放在第二列?

But that is incredibly wasteful. After all, if the computer drops a red checker in the first column and then the rival drops its responsive yellow checker in that same first column, the rival wins the game. Knowing that, the computer should not bother exploring the possibility of its rival instead dropping a responsive checker in the second, third, fourth, fifth, sixth, or seventh columns. If the computer chooses the first column, its rival can win by also choosing the first column, and thus the computer should cut short its analysis, record this bad outcome, and move on to the next relevant question: Might the computer be better off dropping its red checker in the second column?

此图仅展示了计算机使用我们现有代码构建的树的一小部分。顶部棋盘描绘了当前的游戏状态。其下方的三块棋盘展示了计算机可用的几种走法。再下方的四块棋盘展示了计算机对手可能采取的几种应对措施。有趣的是:没有必要评估这些棋盘中的几块,因为如果计算机在第一列丢掉一颗棋子,对手显然会赢。

This diagram shows only a small portion of the tree that the computer builds in our existing code. The top board depicts the game as it currently stands. The three boards below it show a few of the moves available to the computer. The four boards below those show a few of the responses that the computer’s rival might play. The interesting point: there is no reason to evaluate several of these gameboards because the rival clearly is going to win if the computer drops a checker in that first column.

现在我们来尝试一个稍微复杂一点的例子。假设计算机只能考虑两种走法:要么在最左边的列落下一个棋子,要么在最右边的列落下一个棋子。计算机走完后,它的对手同样会可以自由地在最左边或最右边落子。之后,计算机还有一次机会在最左边或最右边落子。请注意,我限制了这个例子,允许玩家只使用这两列,因为我想轻松绘制完整的决策树。如果没有这个限制,决策树会大得多,因为计算机必须考虑其他可能的走法,例如在第二、第三、第四、第五或第六列落子,而计算机的对手同样也必须考虑其他可能的走法,包括在第二、第三、第四、第五或第六列落子。

Now let’s try a slightly more complicated example. Suppose we are in a situation where the computer can consider only two moves: either dropping a checker in the column at the far left or dropping a checker in the column at the far right. After the computer moves, its rival will similarly be free to drop a checker on the far left or far right. And after that, the computer will have one more chance to drop a checker on either the far left or the far right. Note that I am constraining this example to allow the players to use only those two columns because I want to be able to easily draw the full decision tree. Without that constraint, the tree would be much larger since the computer would have to consider other possible moves, such as dropping a checker in the second, third, fourth, fifth, or sixth columns, and the computer’s rival would similarly have to consider other possible moves, again including dropping a checker in the second, third, fourth, fifth, or sixth columns.

然而,即使选择范围有限,完整的递归分析也意味着在下棋前需要评估八种不同的潜在棋盘。也就是说,计算机会评估第一个可能的棋盘:计算机走左列,对手也走左列,然后计算机走左列。接下来,计算机会评估第二个可能的棋盘:计算机在左列落子,对手也选择左列,但这次计算机选择右。继续,计算机会评估第三个可能的棋盘:计算机走左,对手走右,然后计算机走左。接下来,会有第四个棋盘(左;右;右),第五个棋盘(右;左;左),第六个棋盘(右;左;右),第七个棋盘(右;右;左),最后是第八个棋盘(右;右;右),如下页图表所示。

Even with this constrained set of choices, however, a full recursive analysis would mean evaluating eight different potential gameboards before making the move. That is, the computer would evaluate a first possible board where the computer takes the left column, its rival also takes the left column, and the computer then takes the left column. The computer would next evaluate a second possible board where it drops a checker in the left column, its rival again also chooses the left column, but this time in response the computer goes right. Continuing, the computer would evaluate a third possible board where it takes left, the rival takes right, and the computer takes left. Then there would be a fourth board (left; right; right), a fifth board (right; left; left), a sixth board (right; left; right), a seventh board (right; right; left), and finally an eighth board (right; right; right), all as shown in the diagram on the next page.

分析八个棋盘的工作量很大。所以让我们看看计算机能否跳过一些步骤。从棋盘排列 1 和 2 开始。假设第一个棋盘的得分为 -4,这意味着它略微有利于计算机的对手。看到这一点,计算机应该继续对第二个棋盘进行评分,希望能取得更好的结果。假设第二个棋盘的得分为+2。计算机的选择很明确:如果出现这种情况,计算机应该向右移动并接受+2,而不是向左移动并承受 -4。为了弄清楚这一点,我们已经评估了两个棋盘。

Analyzing eight gameboards is a lot of work. So let’s see if the computer can skip some. Start with gameboard permutations 1 and 2. Suppose that the first gameboard scores a −4, which means that it favors the computer’s rival a little. Seeing this, the computer should go ahead and score the second gameboard, hoping for a better result. Let’s suppose that the second gameboard gives +2. The computer’s choice here is clear: if this situation arises, the computer should go right and take the +2 rather than going left and suffering the −4. And just to figure that out we already had to evaluate two gameboards.

现在让我们回溯到树的上层。从前两个棋盘排列中,我们已经知道,如果计算机向左走,然后它的对手也向左走,计算机就会向右走,结果就是棋盘得分+2。然而,计算机的对手并没有义务向左走。在游戏的那个时刻,对手可以选择向右走。所以,目前还没有捷径;计算机需要看看在这种情况下,它的对手是否可以通过向右走而不是向左走来做得更好。

Now let’s work back up the tree. From the first two gameboard permutations, we already know that if the computer goes left and then its rival responds by going left, the computer will go right and the result will be a gameboard that scores +2. The computer’s rival was not obligated to go left, however. At that point in the game, the rival could instead have chosen to go right. So, no shortcuts yet; the computer needs to look to see if its rival might do better by going right in this situation instead of going left.

这就引出了棋盘排列3和4。如果计算机向左走,而对手向右走,计算机就必须在向左到棋盘3和向右到棋盘4之间做出选择。计算机会选择哪一个?要回答这个问题,我们需要更多信息,所以我们必须至少对第三个棋盘进行评分。假设我们这样做了,并且第三个棋盘得分+6。我们还需要对第四个棋盘进行评分吗?

That takes us to gameboard permutations 3 and 4. If the computer goes left and its rival responds by going right, the computer will have to choose between going left to gameboard 3 or going right to gameboard 4. Which will the computer pick? To answer that question, we need more information, so we have to score at least the third gameboard. Suppose we do and the third gameboard scores +6. Do we need to score the fourth gameboard too?

乍一看,你可能会认为计算机需要知道第四个棋盘的得分是否高于第三个,因为它总是想选择得分最高的棋盘。但如果我们的目标是弄清楚计算机的对手在走棋时会怎么做,那么就没有理由结束这条分析路线。为什么?因为即使对第四个棋盘一无所知,我们也已经知道计算机的对手不会走右。这样想一想:如果对手走右,计算机的反应要么是选择棋盘 3 并获得+6分,要么是选择棋盘 4(如果结果显示 4 的得分甚至高于+6)。无论哪种方式,对手最好走左,并且锁定我们已经知道的+2,而不是冒险获得+6,或者根据游戏板 4 的结果,获得对手认为更不理想的结果,例如+8+50

At first blush, you might think that the computer needs to know whether the fourth gameboard scores higher than the third because the computer always wants to pick the highest-scoring gameboard available. But if our goal is to figure out what the computer’s rival will do on its move, there is no reason to finish this line of analysis. Why? Because even without knowing anything about the fourth gameboard, we already know that the computer’s rival will not go right. Think about it this way: if the rival were to go right, the computer would respond by either picking gameboard 3 and earning a score of +6, or picking gameboard 4 if that turns out to offer a score even greater than +6. Either way, the rival is better off going left and locking in the +2 we already know about rather than risking either that +6 or, depending on how gameboard 4 turns out, an outcome that the rival would consider even less desirable, like +8 or +50.

换句话说,计算机不需要评估棋盘4,因为它永远无法选择它。棋盘3对计算机来说已经非常有利,以至于计算机的对手会完全避开这部分棋盘。因此,棋盘4的实际价值无关紧要。认识到这一现实,我们就省去了评估一个棋盘的麻烦。

Put differently, the computer does not need to evaluate gameboard 4 because the computer will never be in a position to choose it. Gameboard 3 is already so good for the computer that the computer’s rival will avoid this part of the tree entirely. The actual value of gameboard 4 is thus irrelevant. And by recognizing that reality, we have just saved ourselves the trouble of evaluating one gameboard.

等等,还有更多。计算机现在知道,如果它通过在最左边的列中放置一个棋子来开始这一系列事件,那么它的对手也会向左移动,计算机最终会得到一个价值 +2 的棋盘。很好,但计算机也许可以做得更好。所以现在我们需要考虑这样一种可能性,即计算机的第一步应该是将棋子放在最右边的列中。然而,在进行这项工作时,我们知道一些非常重要的事情:如果这条路径看起来会导致低于+2的分数,我们就可以停止分析。为什么?因为计算机已经有办法到达价值+2的棋盘,因此+2是计算机的“最坏情况”。计算机希望做得更好,但它会如果向右走会产生低于+2的结果,那么肯定不会向右走。

But wait, there’s more. The computer now knows that if it starts this chain of events by dropping a checker in the left-most column, its rival will also go left, and the computer will end up with a board worth +2. That’s good, but the computer might be able to do better. So now we need to consider the possibility that the computer’s first move should instead be to drop its checker in the right-most column. As we do that work, however, we know something very important: if this path looks like it will lead to a score lower than +2, we can stop our analysis. Why? Because the computer already has a way to reach a gameboard worth +2 and thus +2 is the computer’s “worst case” scenario. The computer hopes to do better but it will certainly not go right if going right will deliver an outcome worth something less than +2.

考虑到这一点,开始游戏。如果计算机从右开始这个序列,我们需要知道它的对手会如何反应。这种反应取决于计算机下一步做什么,这会将我们带到游戏板 5 和 6。假设我们对游戏板 5 进行评分并发现它的得分为 -3。我们应该给第六个游戏板打分吗?是的,因为从计算机的角度来看,第六个游戏板可能会产生更好的结果。让我们假设它更好,给计算机一个价值+1的棋盘。基于这些假设,如果计算机第一步走右,而它的对手响应走左,则计算机下一步将走右并选择游戏板 6,获得+ 1。看到这种情况,对手就知道如果计算机从右开始这个序列,对手可以通过走左到达价值+1的棋盘。对手可能会做得更好,这取决于游戏板 7 和 8 的得分。但是如果计算机从右开始这个序列,对手的“最坏情况”就是向左走,并导致计算机选择游戏板 6 上可用的 +1

With that in mind, play things out. If the computer starts this sequence by going right, we need to know how its rival will respond. That response turns on what the computer would do next, which takes us to gameboards 5 and 6. Suppose we score gameboard 5 and realize it delivers a score of −3. Should we bother to score the sixth gameboard? Yes, because that sixth gameboard might yield a better result from the computer’s perspective. Let’s suppose that it is better, giving the computer a board worth +1. On these assumptions, if the computer goes right for its first move and its rival goes left in response, the computer will go right on its next move and pick gameboard 6, earning +1. Seeing this, the rival knows that if the computer starts this sequence by going right, the rival can reach a board worth +1 by going left. The rival might do better, depending on how gameboards 7 and 8 score. But if the computer starts this sequence by going right, the rival’s “worst case” will be to go left and cause the computer to pick the +1 available at gameboard 6.

当然,计算机永远不会让这种情况发生。毕竟,计算机已经知道,如果它从左开始这个序列,它就会结束得到一个价值 +2 的棋盘因此,我们不需要分析第七和第八个棋盘,也不需要全面评估如果计算机通过向右开始这个序列,其对手会怎么做。查看了棋盘 5 和 6 之后,我们知道计算机最好从向左开始这个序列,因为向左走会得到一个价值 +2 的棋盘,向右走会得到一个价值+1的棋盘(如果对手向左走)或小于 +1 的棋盘如果对手评估了棋盘 7 和 8 并决定向右走更好)。我们通过只对八个相关棋盘中的五个进行评分得出了这一结论。

Of course, the computer is never going to let that happen. After all, the computer already knows that if it starts this sequence by going left, it ends up with a gameboard worth +2. We therefore do not need to analyze the seventh and eighth gameboards, nor do we need to fully evaluate what the computer’s rival would do if the computer opened this sequence by going right. Having looked at gameboards 5 and 6, we know that the computer is better off starting this sequence by going left because going left yields a board worth +2 whereas going right will lead to a gameboard worth either +1 (if the rival goes left) or something less than +1 (if the rival evaluates gameboards 7 and 8 and decides that it is better off going right). And we figured that out by scoring only five of the eight relevant gameboards.

所有这些都需要大量的文字来解释,但这种分析实际上是由一种直觉的双向动态驱动的。一旦对手确定了一组能够通向特定棋盘的走法,它就可以“修剪”那些从自身角度来看必然会导致相同或更糟糕结果的分支。对手不需要知道其他分支可能有多糟糕,也不需要知道有多少个等效分支可用。相反,一旦对手手上有选择,它就可以专注于那些它可能比已知棋盘更偏好的分支。

All of that requires a lot of words to explain but the analysis is in fact driven by an intuitive two-sided dynamic. Once the rival identifies a set of moves that will lead to a specific gameboard, it can “prune” any branches of the tree that are guaranteed to lead to either the same or a worse outcome from its own perspective. The rival does not need to know how much worse those other branches might be, nor does it need to know how many equivalent branches are available. Instead, once the rival has an option in hand, it can focus exclusively on branches that it might prefer over the board already known to be in reach.

类似地,一旦计算机找到一组能够通向特定棋盘的走法,它也可以修剪任何从其角度来看必然会导致相同或更糟糕结果的下游分支。与其对手一样,计算机无需知道其他路径可能有多糟糕,也无需知道有多少等效路径可用。一旦计算机掌握了选项,它就可以专注于那些它可能比已有的“最坏情况”选项更偏好的分支。

Similarly, once the computer finds a set of moves that will lead to a specific gameboard, it, too, can prune any downstream branch that is guaranteed to lead to either the same or a worse outcome from its perspective. Like its rival, the computer does not need to know how much worse those other paths might be or how many equivalent paths are available. Once the computer has an option in hand, the computer can focus exclusively on branches that it might prefer over its already-available “worst case” option.

我们可以将这些比较添加到现有程序中,令人震惊的是,只需几行代码即可完成。首先,向我们的FIND B EST M OVE()函数添加两个输入。将WORST C ASE C OMPUTER定义为计算机迄今为止可证明的最高分数。计算机最终可能会得到比这更好的结果。但根据目前的分析,这是计算机可以合理接受的最坏结果。如果从此时开始的所有选项对计算机来说都不再有吸引力,计算机可以采取必要的行动,在最坏的情况下实现这个已经找到的结果。我们首先将WORST C ASE C OMPUTER设置为 -1000,因为在游戏开始时,计算机在最坏的情况下可能会输掉游戏。同样,将WORST C ASE R IVAL定义为对手根据已分析的行动可证明自己能够获得的最佳分数。从对手的角度来看,对手最终可能会得到更好的结果。但根据目前的分析,这是对手可能接受的最坏结果。如果从此刻开始的所有选项对对手来说都不再有吸引力,对手可以采取必要的行动,在最坏的情况下实现这个既定结果。我们将“最坏情况对手”的初始值设为+ 1000,因为无论如何,在最坏的情况下,计算机的对手总是可以选择放弃比赛,并得到那个糟糕的分数。

We can add these comparisons to our existing program and, shockingly, it takes only a few lines of code to do it. Start by adding two inputs to our FINDBESTMOVE() function. Define WORSTCASECOMPUTER to be the highest score that the computer can provably achieve so far. The computer might end up with a better outcome than this. But based on the analysis so far, this is the worst outcome that the computer will plausibly accept. If all the options from this point forward turn out to be less attractive to the computer, the computer can make the moves necessary to achieve, at worst, this already-found outcome. We will set WORSTCASECOMPUTER to −1000 at first because, at the start of the game, the computer can at worst forfeit the game. Similarly, define WORSTCASERIVAL to be the best score that the rival can provably achieve for itself based on the moves already analyzed. The rival might end up with a better outcome from its perspective. But based on the analysis so far, this is the worst outcome that the rival will plausibly accept. If all the options from this point forward turn out to be less attractive to the rival, the rival can make the moves necessary to achieve, at worst, this already-found outcome. We will initiate WORSTCASERIVAL to be +1000 because, no matter what, at worst the computer’s rival can always choose to forfeit the game and earn that horrible score.

在FIND B EST M OVE()内部,我们将递归使用和更新这些值。从计算机的角度开始。每次计算机考虑移动时,我们现有的代码已经为生成的棋盘计算了一个分数并将其存储在变量POSSIBLE O UTCOME要实现修剪,我们只需做两处更改。首先,我们必须将该分数与当时的WORST CASE R IVAL进行比较。如果当前的POSSIBLE O UTCOME大于WORST CASE R IVAL — 也就是说,如果正在考虑的结果对计算机来说比其对手已经可以选择的某个结果更好 — 我们应该停止分析并放弃分支。一旦计算机找到了比对手已有结果更好的移动,就无需再分析其他棋盘。对手永远不会让计算机达到“对你更好,对我更糟”的棋盘。这是修剪步骤。其次,如果POSSIBLE O UTCOME未被修剪,我们需要将POSSIBLE O UTCOMEWORST C ASE C OMPUTER进行比较。为什么?因为如果当前的POSSIBLE O UTCOME对计算机来说比其现有的最坏情况更好,我们应该更新WORST C ASE C OMPUTER以反映这个新的、更好的“最坏情况”选项。将所有这些结合起来,我们得到下面的代码。

Inside FINDBESTMOVE(), we will then use and update these values recursively. Start from the computer’s perspective. Every time the computer considers a move, our existing code already calculates a score for the resulting gameboard and stores it in the variable POSSIBLEOUTCOME. To implement pruning, we need to make just two changes. First, we have to compare that score to the then-current WORSTCASERIVAL. If the current POSSIBLEOUTCOME is greater than WORSTCASERIVAL—that is, if the result being considered is better for the computer than some result that its rival can already pick—we should stop the analysis and abandon the branch. There is no need to analyze additional gameboards once the computer has found a move that is better than some outcome the rival already has in hand. The rival will never let the computer reach that “better for you, worse for me” gameboard. This is the pruning step. Second, if POSSIBLEOUTCOME is not pruned, we need to compare POSSIBLEOUTCOME to WORSTCASECOMPUTER. Why? Because if the current POSSIBLEOUTCOME is better for the computer than its existing worst case, we should update WORSTCASECOMPUTER to reflect this new, better “worst case” option. Putting all that together, we get the code below.

这解决了计算机的视角问题。要从对手的角度分析游戏,我们需要相似的代码,但关键变量要互换。因此,首先,我们需要将POSSIBLE O UTCOMEWORST C ASE C OMPUTER进行比较。如果POSSIBLE O UTCOME小于WORST C ASE C OMPUTER,我们就可以进行修剪。如果从对手的角度来看,结果比计算机之前在树中已经选出的结果要好,则无需继续沿着一条路径走下去。计算机永远不会允许对手到达这个“对对手有利,对计算机不利”的游戏板。同样,这是修剪步骤。其次,与我们之前的分析类似,如果这个POSSIBLE O UTCOME不进行修剪,我们需要将POSSIBLE O UTCOME最坏情况R IVAL。如果当前可能的结果对对手来说比其现有的最坏情况更好,我们需要更新WORST C ASE R IVAL,反映这个新的、更好的可能棋局。最终代码如下,即上述函数的后半部分。

That takes care of the computer’s perspective. To analyze the game from the rival’s perspective, we need look-alike code but with the key variables flipped. So, first, we need to compare POSSIBLEOUTCOME to WORSTCASECOMPUTER. If POSSIBLEOUTCOME is smaller than WORSTCASECOMPUTER, we can prune. There is no need to continue down a path if the outcome is better from the rival’s perspective than some outcome that the computer can already pick earlier in the tree. The computer will never allow the rival to reach this “better for the rival, worse for the computer” gameboard. Again, this is the pruning step. Second, and also parallel to our earlier analysis, if this POSSIBLEOUTCOME is not pruned, we need to compare POSSIBLEOUTCOME to WORSTCASERIVAL. If the current POSSIBLEOUTCOME is better for the rival than its existing worst case, we need to update WORSTCASERIVAL to reflect this new, better possible board. The resulting code, which is the second half of the function shown above, follows.

就是这样。只需添加这几行代码,我们就可以显著减少计算机的工作量,而且不会牺牲计算机分析的质量。你可能会问,减少的程度有多大?我们很容易通过比较未剪枝搜索中评估的棋盘数量和剪枝搜索中评估的棋盘数量来描述这种变化的重要性。事实上,这些数字令人瞠目结舌。例如,在下面的图表中,我报告了对一个任意样本游戏进行分析时,首先在深度 4 处进行未剪枝分析,然后在进行剪枝分析时得到的这些数字。计算机做出了完全相同的举动,但最终需要得分的棋盘数量大约是后者的四分之一。这令人印象深刻。

And that’s it. With just these few added lines of code, we can cut the computer’s work significantly, and we can do so without sacrificing anything in terms of the quality of the computer’s analysis. How significantly, you ask? It is tempting to describe the significance of the change by comparing the number of gameboards evaluated in a search without pruning to the number of gameboards evaluated in a search with pruning. And, indeed, those numbers are eye-popping. In the chart below, for instance, I report those numbers for an arbitrary sample game analyzed at depth 4 first without pruning and then with pruning. The computer makes the exact same moves but ends up needing to score roughly one-fourth as many gameboards. That’s impressive.

但我们实施修剪并非只是为了制作令人印象深刻的图表。我们这样做是因为我们想在不损失速度的情况下增加深度。也就是说,我们开始本章时的目标不是保持递归深度不变并简单地更快地识别相同的动作。我们的目标是大幅提高速度,以便增加深度,从而使计算机能够做出更好的决策。我们真的可以在这里做到这一点。为此,下表显示了当计算机的对手占据中间列并且现在轮到计算机迈出第一步时分析的棋盘数量。第一行显示当最大深度设置为 1 时分析的棋盘数量。第二行、第三行和后面的行显示当最大深度设置为 2、3 等时分析的棋盘数量。关键点:由于修剪,计算机现在可以在以前只能向前看四步的时间内向前看五步。

But we didn’t implement pruning just to make impressive charts. We did it because we wanted to increase depth without losing speed. That is, our goal when we began this chapter was not to keep the recursive depth constant and simply identify the same moves faster. Our goal instead was to increase speed so much that we could increase depth and in that way empower the computer to make even better decisions. And we really can do that here. To that end, the chart below shows the number of gameboards analyzed when the computer’s rival has taken the middle column and it is now the computer’s turn to make its first move. The first row shows the number of gameboards analyzed when the maximum depth is set to 1. The second, third, and later rows show the number of gameboards analyzed when the maximum depth is set to 2, 3, and so on. The key takeaway: thanks to pruning, the computer can look five moves ahead in the time it used to take to look just four moves ahead.

分别分析了使用和不使用剪枝的棋盘。无论哪种情况,计算机在相同情况下都会做出相同的举动。唯一的区别在于需要分析的棋盘数量。

Gameboards analyzed with and without pruning. Either way, the computer makes the same move in the same situation. The only difference is the number of gameboards that must be analyzed in order to find it.

运行不剪枝的极小极大算法的计算机需要评估 19,607 个棋盘才能以 4 的深度进行搜索。通过剪枝,计算机可以将深度增加到 5,因为即使在更高的深度下,计算机也只需要评估 17,116 个棋盘。

A computer running minimax without pruning needs to evaluate 19,607 gameboards in order to search at a depth of 4. With pruning, the computer can increase the depth to 5 because, even at that higher depth, the computer need evaluate only 17,116 boards.

章节挑战

Chapter Challenge

此 CodeLink 将带您进入一个完整可玩的四子棋版本,它实现了本章和上一章讨论的所有概念。代码中遍布注释,确保您能够理解众多变量名、循环和函数。而且,正如您将看到的,此版本的四子棋游戏速度非常快,而且性能出奇地好。您的挑战是进一步改进代码。

This CodeLink will take you to a fully playable version of Connect Four, one that implements all of the ideas discussed in both this chapter and the previous one. There are comments throughout the code to make sure you can follow the many variable names, loops, and functions. And, as you will see, this version plays a remarkably fast and surprisingly good game of Connect Four. Your challenge is to improve the code even further.

一个有希望的改进是允许代码改变其递归深度。目前,我们的代码在游戏开始时会向前推五步,在游戏过程中也会向前推五步,即使在游戏结束时也只向前推五步。更好的方法是随着游戏的进行增加最大递归深度,同时要认识到,随着已填充空间数量的增加,可能的棋盘数量会减少。换句话说,在游戏早期,我们可能别无选择,只能限制深度,因为即使深度为 5,也意味着要评估成千上万个棋盘。但在游戏后期,当深度为 5 时,可能只涉及 80 个棋盘时,我们应该增加最大递归深度,让计算机做出更好的决策。

One promising improvement would be to allow the code to vary its recursive depth. Right now, our code looks five moves ahead at the start of the game, five moves ahead in the middle of the game, and still just five moves ahead even toward the end of the game. A better approach would be to increase the maximum recursive depth as the game progresses, recognizing that the number of possible gameboards goes down as the number of filled spaces goes up. Put differently, we might have no choice but to limit depth early in the game, when even a depth of 5 means evaluating untold thousands of gameboards. But later in the game, when a depth of 5 might implicate just eighty boards, we should increase the maximum depth and let the computer make better decisions.

第二个改进机会来自于程序目前识别制胜走法的速度较慢。例如,如果计算机正在分析一个棋盘,如果棋盘上第7列落子即可获胜,那么计算机只有在递归评估所有其他走法之后才能确定这一点。也就是说,计算机会首先递归评估在第1列落子的可能性,然后再递归评估在第2列落子的可能性,依此类推。在此过程中,计算机可能会递归地考虑数百甚至数千个棋盘,即使最优走法就在那里,只差一步之遥。如果计算机能够在触发任何更深层次的递归评估之前以某种方式检查出轻松获胜的策略(“我能只用一步就赢吗?”),我们的代码运行速度就会更快。

A second opportunity for improvement comes from the fact that the program as it stands is slow to identify winning moves. For instance, if the computer is analyzing a gameboard where it can win by dropping a checker in column 7, the computer will figure that out but only after recursively evaluating all sorts of other moves. That is, the computer will first recursively evaluate the possibility of dropping a checker in column 1, then recursively evaluate the possibility of dropping a checker in column 2, and so on. Along the way, the computer will recursively consider potentially hundreds or even thousands of gameboards even though the optimal move was sitting right there, one drop away. Our code would run faster if the computer could somehow check for the easy win (“Can I win in just one move?”) prior to triggering any deeper, recursive evaluations.

第三个机会是同样的点数,但针对的是输棋。例如,如果计算机正在分析棋盘,发现下一步不可能取胜,但可能会输,计算机最终会决定阻断,但在找到那个显而易见的玩法之前,计算机很可能会评估数千个棋盘。计算机会首先递归地考虑在第一列落子,然后递归地考虑在第二列落子,依此类推,即使人类玩家会立即意识到现在阻挡是最佳选择。因此,如果计算机能够在触发任何更深层次的递归分析之前以某种方式检查必要的阻挡(“如果我赢不了,我的对手会在下一步获胜吗?”),我们的代码运行速度也会更快。

A third opportunity is the same point, but for losses. For example, if the computer is analyzing a board where it cannot possibly win on its next move but it could lose, the computer will ultimately decide to block, but it might well evaluate thousands of gameboards before finding that obvious play. The computer will first recursively consider dropping a checker in the first column, then recursively consider dropping a checker in the second column, and so on, even though a human player would immediately realize that blocking right now is the best choice. Here too, then, our code would run faster if the computer could somehow check for the necessary block (“If I cannot win, will my rival win in the next move?”) prior to triggering any deeper, recursive analysis.

第四个改进机会源于这样一个事实:我们的代码通常从左向右移动,从第一列开始到第七列结束,尽管中间列通常更有价值。下棋时,你不会先考虑最左边的兵,然后再考虑倒数第二个兵,依此类推,直到所有棋子。相反,你肯定会从最有影响力的棋子开始,比如后或车。在这里,计算机可以类似地先评估中间列,然后再评估边,从而减少工作量。这样,计算机很可能会在分析中更早地找到更好的走法,从而可以更早地修剪次优棋盘。

A fourth opportunity for improvement derives from the fact that our code in general moves left to right, starting with the first column and ending with the seventh, even though the middle columns are typically more valuable. When you play chess, you do not first think about the left-most pawn, then think about the second-to-the-left pawn, and so on, through all the pieces. Instead, you surely start by looking at the most impactful pieces like the queen or a rook. Here, the computer could similarly cut down its work by evaluating the middle columns prior to evaluating the edges. In this way, the computer would likely find better moves earlier in its analysis, which would then allow for even earlier pruning of second-best boards.

最后,第五个潜在的改进是将你刚刚编写的代码替换为更具分析性的版本。也就是说,在上一段中,我们假设中间列比边缘更好。但计算机可以根据实际棋盘情况做出决策,具体来说,就是为每个可能的第一步棋计算棋盘分数,然后根据这些初始的一步棋分数确定递归分析的优先级。例如,计算机可以在第一列落子并计算棋盘分数,在第二列落子并计算棋盘分数,以此类推,然后根据这一步棋的落子情况,询问七个选项中哪一个看起来最有希望。然后,计算机可以选择先递归地寻找更有希望的路径,然后再递归地寻找不太有希望的路径,因为它知道那些不太有希望的路径最终可能会被我们现有的剪枝代码剪掉。

Finally, a fifth potential improvement would involve replacing the code you just wrote with a more analytical version. That is, in the prior paragraph, we assumed that the middle columns are better than the edges. But the computer could make that decision based on the actual board in play, specifically by scoring the board for each potential first move and then prioritizing its recursive analysis based on those initial, one-move scores. For instance, the computer could drop a checker in the first column and score the board, drop a checker in the second column and score the board, and so on, and then ask which of the seven options looks most promising based on that one drop. The computer could then choose to recursively pursue the more promising paths before recursively pursuing the less promising ones, knowing that the less promising ones might well end up pruned by our existing pruning code.

随机模拟

Random Simulation

7

7

投掷飞镖

Throwing Darts

两位数学家走进一家酒吧。其中一位拿起一张纸,画了一个内接于正方形的圆,圆的直径等于正方形边长。这位数学家请一位同事计算圆的面积占正方形面积的百分比。

Two mathematicians walk into a bar. One grabs a piece of paper and draws a circle inscribed in a square such that the diameter of the circle is equal to the length of the square’s side. The mathematician asks a colleague to calculate the area of the circle as a percentage of the area of the square.

第二位数学家笑了。“我知道公式。正方形的面积是通过将一条边的长度乘以另一条边的长度来计算的。要计算圆的面积,你需要计算圆半径的平方,然后乘以π。因为这个圆的半径是正方形边长的一半,所以在这个例子中,圆的面积大约是正方形面积的79%。“

The second mathematician smiles. “I know my formulas. The area of a square is calculated by multiplying one side’s length by the other. To calculate the area of a circle, you square the circle’s radius and multiply by π. Because the radius of this circle is half the length of the side of the square, the circle’s area in this example will be roughly 79 percent of the area of the square.”

第一位数学家大吃一惊,又拿起一张纸,这次在正方形里画了一个三角形。三角形的底边位于正方形的一边,顶边延伸到正方形的另一边。“这个怎么样?”第一位数学家问道。他的同事又笑了。“我也知道那个公式。三角形的面积是底乘以高的一半。因为这个三角形的底和高都等于正方形边长,所以这个三角形的面积是正方形面积的一半。”

Impressed, the first mathematician grabs a new piece of paper and this time inscribes a triangle in the square. The base of the triangle sits on one side of the square while the top of the triangle reaches all the way to the opposite side. “How about this one?” the first mathematician asks. The colleague smiles again. “I know that formula, too. The area of a triangle is calculated as one-half base times height. Because the base of this triangle and the height of this triangle both equal the length of the side of the square, the triangle’s area is one-half the area of the square.”

“很好,”最初的数学家一边说着,一边翻到最后一页。这一次,数学家在正方形里画了一个乱七八糟的图形,没有直线,没有可辨认的形状,与正方形的边也没有明显的关系。“那这里呢?当没有公式可以拯救你的时候,你会怎么做?”

“Very good,” acknowledges the original mathematician while reaching for one final page. This time, the mathematician draws a wild scribble of a figure inside the square, with no straight lines, no recognizable shape, and no obvious relationship to the sides of the square. “And here? What will you do when no formula can save you?”

酒吧后方,一位年轻人正在向靶盘掷飞镖。第二位数学家走近年轻人,简短交谈后,将新画的图案用胶带贴在墙上,遮住了年轻人的靶心。然后,数学家要求年轻人像之前一样掷飞镖。人群聚集时,数学家说道:“这位年轻人刚刚向那幅画掷了十二支飞镖。所有飞镖都落在画的正方形内,还有八支落在那个波浪线内。因此,我得出结论,波浪线的面积与正方形面积之比约为8比12,或者更简单地说,波浪线的面积大约是正方形面积的三分之二。”

Toward the back of the bar, a young man is throwing darts at a dartboard. The second mathematician approaches the young man and, after a brief conversation, tapes the new drawing to the wall such that it covers the young man’s target. The mathematician then asks the young man to throw his darts, just as before. As a crowd gathers, the mathematician speaks. “This young man just threw twelve darts at that drawing. All of them landed inside the drawn square, and eight also landed inside that squiggle. I therefore conclude that the ratio of the area of the squiggle to the area of the square is approximately 8 to 12, or, more simply, that the area of the squiggle is roughly two-thirds the area of the square.”

这段小插图中的数学家从一系列随机事件中获得了宝贵的信息。在这种情况下,数学家知道飞镖落入正方形的可能性与正方形的大小成正比。正方形越大,可能性越大。同样,数学家也知道飞镖落入曲线的可能性与曲线的大小成正比。同样,面积越大,可能性越大。因此,数学家可以通过比较这些概率来估计两个图形的相对大小。

The mathematician in this vignette learned valuable information from a series of random events. In this case, the mathematician knew that the likelihood of a dart landing inside the square would be in proportion to the size of the square. The bigger the square, the greater the odds. The mathematician likewise knew that the likelihood of a dart landing inside the squiggle would be in proportion to the size of the squiggle. Again, the larger the area, the greater the odds. Thus the mathematician could estimate the relative sizes of the two figures by comparing those probabilities.

诚然,所有这些都不如应用一些精确的、众所周知的数学公式准确。但在没有公式可应用,或者更传统的方法过于繁琐的情况下,一系列精心设计的随机事件可以对原本难以解决的问题给出出人意料的准确估计。而且在编写计算机算法时,这被证明是一种非常有用的见解。

All this is admittedly less accurate than applying some precise, well-known mathematical formula. But in cases where there is no formula to apply, or when more conventional approaches are unworkably cumbersome, a well-crafted series of random events can lead to a surprisingly good estimate for what would otherwise be a hard-to-solve problem. And that turns out to be an incredibly useful insight when it comes to writing computer algorithms.

从这个角度考虑一下益智游戏2048。两块砖头随机放置在四乘四的网格上。每块砖头有 90% 的机会标有 2,有 10% 的机会标有 4。当玩家向左、向右、向上或向下倾斜棋盘时,游戏就开始了,砖头会按选定的方向滑动。每块砖头在碰到棋盘边缘或另一块砖头时就会停止。如果两块数字相同的砖头相撞,则这些砖头会被移除,并由标有它们数字和的新砖头替换。然后,新砖头完成滑动,但在同一回合中不能与任何其他砖头组合。一旦所有砖头都根据这些规则组合并滑动完毕,该回合就完成,并且会将一块额外的砖头随机放置在棋盘上的空位上。同样,新牌有 90% 的可能性是 2,有 10% 的可能性是 4。游戏的目标是操纵棋盘,构建至少一个价值 2048 的牌。要做到这一点,玩家不仅必须随着时间的推移策略性地碰撞较小的牌,还要避免所有十六个空间都被填满,并且没有可行的碰撞能够创建新的空白空间的情况。

Consider in this light the puzzle game 2048. Two tiles are placed at random on a four-by-four grid. Each tile has a 90 percent chance of being labeled with a 2 and a 10 percent chance of being labeled with a 4. Play begins when the player tilts the board left, right, up, or down, causing the tiles to slide in the chosen direction. Each tile stops when it hits either the edge of the board or another tile. If two tiles with the same number collide, those tiles are removed and replaced by a new tile labeled with their sum. The new tile then finishes the slide, but it cannot combine with any other tiles on that same turn. Once the tiles have all combined and slid according to these rules, that turn is complete and an additional tile is placed at random in an empty location on the board. There is again a 90 percent chance that the new tile will be a 2 and a 10 percent chance that the new tile will be a 4. The goal of the game is to manipulate the board to build at least one tile worth 2048. To do that, a player must not only strategically collide smaller tiles over time but also avoid a situation where all sixteen spaces are filled and there is no plausible collision capable of creating a new empty space.

那么,计算机该如何玩2048呢?一种方法是使用类似于我们用于井字棋和四子棋的决策树。我们首先绘制一个节点来表示当前棋盘,并将其连接到四个下游选项:一个代表向上倾斜后的棋盘,一个代表向下倾斜后的棋盘,一个代表向左倾斜后的棋盘,一个代表向右倾斜后的棋盘。然后,我们会将每个下游选项连接到最多 30 个下游棋盘,因为棋盘上同时最多可能有 15 个空白处,并且倾斜后,每个空白处都可以用随机的 2 或 4 填充。在此基础上,我们将四个分支连接到每个下游棋盘,分别代表接下来的向上、向下、向左和向右倾斜。我们还会继续进行下去,添加更多分支来表示剩余的空白处,然后再添加更多分支来表示可能的倾斜,如此循环往复。

So how might a computer play 2048? One approach would be to use a decision tree similar to the ones we used for tic-tac-toe and Connect Four. We would start by drawing a node to represent the current gameboard and connecting it to four downstream options, with one representing that board after a tilt up, one representing that board after a tilt down, one representing it after a tilt left, and one representing it after a tilt right. To each of those, we would then connect up to thirty downstream gameboards because there can be as many as fifteen blank spaces on the board at any one time and, after a tilt, each of those spaces can be filled with either a random 2 or a random 4. From there, we would connect four branches to each of those gameboards, this time to represent the next tilts up, down, left, and right. And we would keep going beyond that, adding more branches to account for the remaining blank spaces and then more branches to account for the possible tilts, round after round after round.

那样就成了一棵大树了。事实上,这些数字很快就变得离谱了。仅仅两次倾斜之后,我们就已经拥有超过一万四千个棋盘需要评估。如果再增加第三个,我们就会面临超过一百万个棋盘。更糟糕的是,完成所有这些工作后,计算机只能提前规划两三步,即使一次成功的2048游戏通常需要大约一千步,才能反复将各个 2 和 4 翻倍足够多次,最终创建单个 2048 牌。

That would make for one big tree. Indeed, the numbers quickly become outlandish. After just two tilts, we already have more than fourteen thousand gameboards to evaluate. Dare to add a third and we would be looking at something over a million. Worse, doing all that work would allow the computer to plan only two or three moves ahead, even though a successful game of 2048 will typically require roughly one thousand moves in order to repeatedly double the various 2’s and 4’s enough times to ultimately create a single 2048 tile.

这个示例游戏一开始,底行随机放置两个 2。向右倾斜棋盘会出现第二个棋盘,这两个 2 会组合成一个 4,然后第一行会随机添加一个 2。向下倾斜棋盘会出现第三个棋盘,2 和 4 会落到底行,并在它们旁边随机添加一个 2。剩余的棋盘会使用同一棋盘进行一系列额外的倾斜。

This sample game started with two 2’s placed randomly along the bottom row. Tilting the board right led to the second board, where those 2’s combined into a 4, and then a random 2 was added in the first row. Tilting that board down resulted in the third board, with the 2 and the 4 dropping to the bottom row and a random 2 being added alongside them. The remaining boards show a sequence of additional tilts using the same board.

所以我们需要一种更好的方法,比如飞镖。

So we need a better approach. Like darts.

考虑一种算法,我们将棋盘向上倾斜,然后随机地向四个方向倾斜,如此反复,直到棋盘死锁,没有空余空间,也不再可能发生碰撞。我们对棋盘进行计分。接下来,我们将棋盘恢复到原始状态,向上倾斜,并再次进行一系列随机移动,直到游戏陷入僵局。我们为棋盘计分。最终,我们总共进行了五十场随机游戏,每场游戏都从向上倾斜开始,之后完全随机进行。如果我们追踪并计算所有得分的平均值,最终就能对如果在游戏的这个阶段向上倾斜,游戏的走向有一个数值估计。

Consider an algorithm where we tilt the board up and then randomly tilt in any of the four directions, again and again, until the board deadlocks such that there are no empty spaces and no possible further collisions. We score the board. Next, we reset the board to its original condition, tilt up, and again make a series of random moves until the game deadlocks. We score that board. We ultimately play a total of fifty random games, each starting with an upward tilt but from there playing completely at random. If we track and average all of the resulting scores, we would ultimately have a numeric estimate for how the game will go if, at this point in the game, we tilt up.

从这里开始,我们可以再重复这个过程三次,其中一组随机游戏用于评估向下倾斜的可能性,另一组用于评估向左倾斜的可能性,第三组用于评估向右倾斜的可能性。接下来我们需要做的就是比较得到的平均值。如果向上倾斜的平均得分最高,我们就向上倾斜。如果向右倾斜的平均得分最高,我们就向右倾斜。无论随机模拟告诉我们什么,我们都会遵循,从计算机不完美但信息丰富的随机经验中学习。

From there, we can repeat the process three more times, with one group of random games being used to evaluate the possibility of tilting down, another being used to evaluate the possibility of tilting left, and a third being used to evaluate the possibility of tilting right. All we would need to do from there is compare the resulting averages. If the average score associated with tilting up is the highest, we tilt up. If the average score associated with tilting right is the highest, we tilt right. Whatever the random simulations teach, we follow, learning from the computer’s imperfect but informative random experience.

这种随意的方法存在明显的缺点。最明显的是,计算机在许多模拟游戏中会做出非常愚蠢的举动,随机地向一个方向倾斜,即使另一个方向显然是更好的选择。但也有一个抵消的好处:通过跳过单独评估每一步的计算密集型过程,计算机可以快速模拟(在本例中)200场随机游戏,从而对全局问题获得宝贵的见解:初始的向左、向右、向上或向下倾斜哪个更有可能带来理想的结果。本质上,该算法用精度换取深度,因为它认识到在这种情况下,大深度下的模糊图像可能比浅深度下的清晰图像更有价值。

Now there are obvious disadvantages to this haphazard approach. Most notably, the computer will make downright foolish moves in many of the simulated games, randomly tilting in one direction even when some other direction is clearly a better choice. But there is an offsetting benefit: by skipping the computationally intensive process of evaluating each move in isolation, the computer can quickly simulate (in this example) two hundred random games, gaining valuable insight into the big-picture question of whether an initial tilt left, right, up, or down is more likely to lead to a promising result. In essence, the algorithm trades precision for depth, recognizing that a blurry picture at great depth might in this instance be significantly more valuable than a clear picture at shallow depth.

但这种方法究竟效果如何?为了回答这个问题,让我们来编写代码。从新的SCORE B OARD()函数开始。对2048游戏板进行评分的一种简单方法是取板上的每个数字,计算其平方,然后报告这些平方和。这种数学方法明显有利于包含大数字的板,而不是只包含小数字的板,因为计算数字的平方会放大它们之间的任何差距。例如,在这种方法下,具有单个 1024 方块的板比具有两个 512 方块的类似板获得更高的分数,而具有两个 512 方块的板比具有四个、五个甚至六个 64 方块的板获得更高的分数。

But how well does this approach really work? To answer that question, let’s write the code. Start with the new SCOREBOARD() function. One simple way to score a 2048 gameboard is to take every number on the board, square it, and report back the sum of those squares. This math significantly favors boards that sport large numbers over boards that include only small ones because squaring numbers magnifies any gap between them. For instance, under this approach, a board with a single 1024 tile would earn a higher score than a similar board with two 512 tiles, and a board with two 512 tiles would earn a higher score than a board with four, five, or even six 64 tiles.

对于每一步棋,计算机都会模拟五十场随机游戏。计算机最终会选择平均表现最佳的一步棋。

For each move, the computer simulates fifty random games. The computer will ultimately pick the move that performs best on average.

诚然,在2048棋盘上得分有更好、更复杂的方法;但目前我的目标只是选择一种计算机能够快速执行的合理方法。这样,计算机就可以将资源集中在算法的核心概念上:运行大量随机模拟游戏,粗略地评估当前可能的四种走法。

Admittedly, there are better, more sophisticated ways to score a 2048 board; for now, however, my goal is simply to choose a plausible approach that the computer can implement quickly. This way the computer can focus its resources on the algorithm’s core concept: running large numbers of random, simulated games in order to roughly evaluate the four possible moves at hand.

接下来是一个新函数PLAY R ANDOMLY(),它将当前棋盘作为输入,然后,顾名思义,随机地一次又一次倾斜棋盘,直到棋盘陷入僵局。然后,该函数使用SCORE B OARD()返回最终僵局棋盘的得分。代码如下:下面的代码片段展示了代码的大概样子。请注意,我使用了一些游戏特有的函数,例如PLACE R ANDOM T ILE()TILT U P()TILT D OWN()来实现基本的游戏玩法,并且我还编写了一个IN D EADLOCK()函数,如果棋盘上没有空位,并且任何倾斜都无法创造出空位,则该函数返回TRUE。这些函数本身并没有什么特别之处,所以我把这些细节留到 CodeLink 中,这里只包含PLAY R ANDOMLY()代码。

Next up is a new function, PLAYRANDOMLY(), that takes the current gameboard as input and then, as the name implies, randomly tilts the board again and again until the board deadlocks. This function then returns the score for that final, deadlocked board using SCOREBOARD(). The code snippet below shows what the code might look like. Note that I use some game-specific functions like PLACERANDOMTILE(), TILTUP(), and TILTDOWN() to implement the basic gameplay, and I also wrote an INDEADLOCK() function that returns TRUE if there are no empty spaces on the board and no tilt can possibly create one. There is nothing tricky about any of those functions, so I save those details for the CodeLink and include just the PLAYRANDOMLY() code here.

最后,我们需要一个FIND B EST M OVE()函数,该函数将当前棋盘作为输入,并返回向上、向下、向左或向右倾斜的建议作为输出。该函数首先将当前棋盘向上倾斜,使用PLAY R ANDOMLY()函数模拟 50 场游戏,并计算这 50 场游戏的平均得分。然后,该函数将原始棋盘向下倾斜,使用PLAY R ANDOMLY()函数模拟 50 场游戏,并计算这 50 场游戏的平均得分。之后,代码对向左和向右执行相同的操作,最终根据这 200 次模拟,建议可获得最高平均得分的走法。

Finally, we need a FINDBESTMOVE() function that takes the current gameboard as input and returns as output a suggestion to tilt up, down, left, or right. This one starts by tilting the current board up, simulating fifty games using the PLAYRANDOMLY() function, and calculating the average score for those fifty games. Then the function tilts the original gameboard down, simulates fifty games using PLAYRANDOMLY(), and calculates the average score for those fifty games. The code then does the same work for left and right, ultimately suggesting the move that yields the highest average score based on those two hundred simulations.

准备好评判这种方法了吗?这个 CodeLink 将带你体验飞镖游戏的完整实现。运行几次,观察计算机如何分析、倾斜、分析,然后再倾斜。计算机会做出一些令人痛苦的荒谬举动吗?计算机成功拼出至少 2048 的方块的概率是多少?

Ready to judge this approach? This CodeLink will take you to the full darts implementation. Run it a few times and watch as the computer analyzes, tilts, analyzes, and tilts again. Does the computer make any painfully ridiculous moves? How often does the computer successfully build a tile worth at least 2048?

下面的图表记录了我运行这段代码的体验。最左边是我在不使用任何算法或逻辑的情况下,对计算机进行编程,使其随机倾斜棋盘时得到的结果。图表显示,这种完全随机的方法在 128 或更低的数值上死锁的概率为 93%。接下来是我使用极小极大方法(深度设置为 3)得到的结果。该列报告了更好的结果:60% 的概率,计算机在其最高方块显示 1024 时死锁,15% 的概率,计算机一路走到 2048。但现在看看飞镖游戏的数据。在那里,计算机以惊人的 43% 的概率赢得了游戏,至少构建了一个 2048 的方块。

The chart that follows reports my own experience running this code. On the far left are the results I obtained when I programmed the computer to randomly tilt the board without using any algorithm or logic. As the chart shows, 93 percent of the time this completely random approach deadlocked at 128 or less. Next are the results I obtained with a minimax approach, depth set to 3. That column reports better outcomes: 60 percent of the time the computer deadlocked with its highest tile showing 1024, and 15 percent of the time the computer made it all the way to 2048. But now look at the data for the darts implementation. There, the computer won the game a remarkable 43 percent of the time, building at least one 2048 tile.

章节挑战

Chapter Challenge

随机模拟似乎效果不错,但本章的挑战是弄清楚它的效果究竟如何。先从计算机最早的决策说起。如果计算机模拟这些决策而不是随机倾斜棋盘,这真的很重要吗?一种理论认为,这一定很重要,因为早期的走法为后续的走法奠定了基础。然而,另一种理论认为,早期的走法只是噪音,因为它们之后必然会出现数百次更重要的倾斜。你能编辑代码,故意在一开始做出错误的选择,看看这种变化到底有多大影响吗?

Random simulation seems to work well, but your challenge in this chapter is to figure out how well. Start with the computer’s earliest decisions. Does it really matter if the computer simulates those decisions rather than just randomly tilting the board? One theory is that it must matter because those early moves set the stage for the moves that follow. Another theory, however, is that the early moves are noise, given that they will inevitably be followed by hundreds of later, more important tilts. Can you edit the code to intentionally choose poorly in the beginning and see how impactful that change really is?

接下来,回到原始代码,关注游戏的后续步骤。当棋盘上棋子密集,或者数字很大时,计算机是否应该放弃随机方法,转而采用深度受限的极小极大搜索?直观地说,模拟游戏结局的树应该大小适中,这意味着计算机可以合理地完成这项工作,而不是依赖随机方法。你能编辑代码,看看在游戏接近尾声时,随机方法和有目的的方法是如何组合的吗?

Next, return to our original code and focus on the later moves in the game. When the board is pretty packed, or when the numbers are pretty big, should the computer abandon the random approach and at that point pursue a depth-limited minimax search? Intuitively, a tree that models the end of the game should be a manageable size, which means the computer can plausibly do the work rather than relying on a random approach. Can you edit the code to see how random and purposeful approaches might be combined as the game nears its conclusion?

8

8

瞄准飞镖

Aiming Darts

上一章,我们使用了一组有组织但随机的模拟来选择游戏2048中的走法。我们采用这种随机方法是因为2048的完整博弈树非常庞大。我们没有尝试创建和评估它,而是指示计算机专注于眼前的四个选择,在给定现实约束的情况下运行尽可能多的模拟,然后做出平均而言在所有模拟中效果最佳的选择。计算机的性能结果相当不错。在本章末尾报告的测试中,随机方法的表现优于几个相当不错的基准方法。尽管如此,仍有很大的改进空间。

Last chapter, we used a set of organized but random simulations to pick moves in the game 2048. We adopted this random approach because the full game tree for 2048 is enormous. Rather than attempting to create and evaluate it, we instructed the computer to focus on the four choices immediately at hand, run as many simulations as possible given realistic constraints, and then make the choice that worked best, on average, across those simulations. The computer’s performance turned out to be reasonably good. In the tests reported at the end of the chapter, the random approach outplayed several respectable baselines. Still, there is significant room for improvement.

一个问题是,我们上一章的代码将四个选项视为同等重要,无论这种可能性是否合理。例如,当我们让计算机在做出任何举动之前运行两百次模拟时,我们构建了这样的代码:其中五十次模拟用于探索向上倾斜、五十次向下、五十次向右和五十次向左倾斜的可能性。而且,尴尬的是,即使很明显其中一个选项是完全失败的,我们仍然坚持这种平衡的分配。例如,如果在前四十次模拟中,向左倾斜看起来像是一场彻头彻尾的灾难,我们仍然盲目乐观;我们又模拟了十次那条糟糕的路径,甚至没有考虑这最后十次模拟是否可以更好地用于提高其他更合理方案的利弊。在本章中,我们将做得更好。我们将密切关注模拟结果,并始终根据先前模拟的结果来决定下一次模拟的投入方向。

One issue is that our code last chapter treated each of the four options as equally worthy, regardless of whether that was plausibly true. When we allowed the computer to run two hundred simulations prior to making a move, for instance, we structured the code such that fifty of those simulations were used to explore the possibility of tilting up, fifty down, fifty right, and fifty left. And, awkwardly, we stuck with that balanced allocation even when it was clear that one of those options was a total loser. For example, if tilting left looked like an unmitigated disaster the first forty times we simulated it, we nevertheless remained blindly optimistic; we simulated that awful path ten more times, not even considering whether those last ten simulations could have been better used to sharpen the pros and cons of other, more plausible options. In this chapter, we will do better. We’ll pay attention to simulation results as we get them, always deciding where to invest the next simulation based on the results obtained from prior ones.

第二个可以改进的地方是,我们上次的代码使用了简单平均值,尽管平均值可能极具误导性。设想这样一种情况:计算机正在考虑向右倾斜,因此进行了一百次模拟,先向右倾斜,然后随机地向上、向下、向左或向右移动。如果其中七十五次模拟的结果很糟糕,而二十五次模拟的结果非常好,那么我们上一章的代码就会将向右倾斜定义为总体上不具吸引力。这七十五次糟糕的结果将严重不利于向右倾斜,并会冲淡二十五次好的结果。但是,如果所有七十五次失败的结果都与先向右然后向上、先向右然后向左和先向右的移动有关,而所有二十五次成功的结果都可靠地是先向右然后向下的形式呢?根据这种假设,向右倾斜将是一个很棒的举动,因为它将为非常有希望的向下倾斜奠定基础,但上一章的代码永远不会提出这一点。因此,在本章中,我们将跟踪下一级别的细微差别,并根据每个模拟开始时的多个动作对其进行分类。

A second area for potential improvement is that our code last time used simple averages, even though averages can be extremely misleading. Consider a situation where the computer is thinking about tilting right and thus runs one hundred simulations where it tilts right and then randomly moves up, down, left, or right. If seventy-five of those simulations were to come back with terrible results, and twenty-five were to come back with fantastic results, our code from last chapter would characterize a rightward tilt as unattractive overall. The seventy-five bad outcomes would count heavily against tilting right, and they would water down the twenty-five good ones. But what if it turned out that all seventy-five of those losses were associated with the moves right-then-up, right-then-left, and right-then-right, while all twenty-five successful outcomes were reliably of the form right-then-down? Tilting right would be a great move on that assumption because it would set the stage for that very promising tilt down, but last chapter’s code would never suggest it. This chapter, we will therefore keep track of this next level of nuance, categorizing each simulation based on more than just the one move with which it begins.

这两项改进本质上都是关于战略性地投资模拟的想法。如果计算机的时间只有两百次模拟,我们可能希望第一次模拟向左倾斜,第二次向右倾斜,第三次向上倾斜,第四次向下倾斜。但如何最好地利用第五次模拟是一个战略问题,我们在上一章完全忽略了这一点。我们是否应该重新审视目前看起来最有希望的选项?看起来最没有希望的选项?那么接下来的下一次模拟呢?我们应该再次关注最有希望或最没有希望的选项吗?我们是否应该将注意力扩展到更广泛的地方,也许研究一个相对较少被探索的举动,或者专注于像先上后右或先上后右再下这样的两步或三步选项?

Both of these improvements are in essence ideas about investing simulations strategically. If the computer has time to run only two hundred simulations, we might want to spend the first simulation tilting left, the second tilting right, the third tilting up, and the fourth tilting down. But choosing how best to spend the fifth simulation is a strategic question that we utterly ignored last chapter. Should we take a second look at the option that currently seems most promising? The one that seems least promising? And what about the next simulation after that? Should we again focus on either the most or least promising option? Should we instead spread our attention more broadly, maybe studying a move that has been comparatively less explored, or focusing on a two-step or three-step option like up-then-right or up-then-right-then-down?

简而言之,上一章我们只是投掷飞镖。本章我们瞄准飞镖。为了理解如何操作,我们来考虑一个新游戏:纸牌游戏二十一点。在我们的版本中,只有两个玩家:电脑和庄家。每个玩家从一副牌中随机选择一定数量的牌,然后,选择好牌后,他们会比较手中的牌,以确定哪位玩家的点数最接近21点且不超过21点。对于计算机来说,策略挑战在于选择抽多少张牌。答案不仅取决于计算机对下一张牌(该牌能帮助计算机更接近但不超过21点)的可能性的预测,还取决于计算机对庄家抽牌时点数的预测。

In short, last chapter we simply threw darts. This chapter we aim them. To understand how, let’s consider a new game: the card game Blackjack. In our version of the game there are just two players, the computer and the dealer. Each chooses some number of random cards from a deck, and, after choosing those cards, they compare their resulting hands to determine which player has come closest to 21 without exceeding that total. For the computer, the strategic challenge comes in choosing how many cards to draw. The answer depends not only on the computer’s prediction as to the likelihood of receiving a next card that will help the computer move closer to, but not exceed, 21, but also on the computer’s prediction as to the total the dealer will achieve when the dealer draws its cards.

游戏玩法如下:计算机随机抽取一张牌并揭开,计算机和发牌人都知道其点数。然后,发牌人随机抽取一张牌,也完全揭开。接下来,计算机随机抽取第二张牌作为自己的牌,发牌人也随机抽取第二张牌作为自己的牌,但这次只有计算机的牌是公开的。发牌人第二张牌的点数是确定的,但保密。即使是发牌人也不知道这张牌的点数。

Gameplay works as follows. The computer randomly draws one card and reveals it, such that both the computer and the dealer know its value. The dealer then draws a random card, also fully revealed. The computer next draws a second random card for its hand, and the dealer draws a second random card for its hand, but this time only the computer’s card is public. The value of the dealer’s second card is established but secret. Not even the dealer knows that card’s value.

发完这些初始牌后,计算机可以反复选择是再拿一张牌还是继续使用手中已有的牌。同样,计算机的目标是抽出总点数大于庄家最终点数但绝不超过 21 点的牌。计算机掌握一些与此相关的信息。计算机知道游戏开始时一副完整的 52 张牌。计算机也知道其手中两张公共牌的点数以及庄家持有的一张公共牌的点数。如果计算机拿了一张牌,最终点数超过 21 点,则称计算机“爆牌”,无论庄家持有什么牌,计算机都会立即输掉游戏。并且,计算机可以随时拒绝再拿一张牌,从而选择继续使用手中已有的牌。

With those initial cards dealt, the computer is now repeatedly given the option of either taking an additional card or holding firm with the cards it already has in its hand. Again, the computer’s goal is to draw cards that add up to a number that is greater than whatever the dealer will ultimately have but in no case greater than 21. And the computer has some information relevant to that question. The computer knows that the game started with a full deck of fifty-two cards. The computer also knows the value of the two cards in its public hand and the value of the one public card held by the dealer. If the computer takes a card and ends up with a hand worth more than 21, the computer is said to have “busted” and as a result immediately loses, regardless of what cards the dealer holds. And, at any time, the computer can decline to take additional cards and in that way choose to “stand” on the cards already in its possession.

计算机停牌后,轮到发牌员。发牌员首先亮出自己的底牌,然后有机会补牌。发牌员的目标是使总点数大于计算机点数,但不超过21点。然而,由于此时发牌员已经知道计算机的最终点数,因此在决策过程中享有不公平的优势,因此发牌员必须采取一种简单的、无视事实的策略:发牌员必须补牌,直到其牌点总和达到或超过17点;然后,无论如何,发牌员都必须停牌。

After the computer stands, the dealer takes its turn. The dealer first reveals its hidden card and then is given the chance to draw additional cards. The dealer’s goal is to achieve a total that is greater than the computer’s total but no greater than 21. That said, because the dealer at this point knows the computer’s final total and hence enjoys an unfair advantage in its decision-making, the dealer is required to play a simple, ignore-the-facts strategy: the dealer must take additional cards until the value of its cards adds up to 17 or more; then, no matter what, the dealer must stand.

以下是一些示例游戏。请注意,牌的花色无关紧要。红桃、黑桃和梅花相同。此外,K、皇后、杰克和 J 都值 10,玩家可以将 A 视为值 1 或 11。虽然在游戏的某些版本中,两张牌总数为 21 的一手牌被认为可以打败任何总数也为 21 的更大的一手牌,但为了我们的目的,让我们简化,只关注每手牌的值,而不管使用了多少张牌。

Some sample games are shown below. Note that a card’s suit does not matter. Hearts are the same as spades are the same as clubs. Moreover, kings, queens, and jacks are all worth 10, and a player can treat an ace as worth either 1 or 11. And while in some versions of the game a two-card hand that totals to 21 is deemed to beat any larger hand that also totals to 21, for our purposes let’s simplify by focusing only on the value of each hand, regardless of how many cards are used.

在左侧的示例游戏中,计算机最初手牌点数为13,并且知道庄家明牌点数为10。计算机又拿了一张牌,然后停了下来,稳稳地站在18点。随后,庄家亮出了自己的暗牌,由于19大于18但不超过21,因此庄家赢得了该轮。在右侧的游戏中,计算机最初手牌点数为17,因此没有再拿任何牌。随后,庄家的暗牌被亮出,变成了4,于是庄家又拿了一张牌。然而,这对庄家来说结果很糟糕,因为这张暗牌使庄家的总点数超过了21。因此,庄家爆牌,输掉了该轮。

In the sample game on the left, the computer initially had a hand worth 13 and knew the dealer had 10 showing. The computer took another card and then stopped, holding firm at 18. The dealer then revealed its hidden card and, because 19 is greater than 18 but does not exceed 21, the dealer won the round. In the game on the right, the computer started with a hand already worth 17 and so did not take any additional cards. The dealer’s hidden card was then revealed to be a 4, so the dealer took another card. That turned out poorly for the dealer, however, because that additional card took the dealer to a total greater than 21. The dealer thus busted, losing the round.

在我们在这个新游戏的背景下实施本章的瞄准飞镖策略之前,让我们首先快速教计算机玩二十一点使用上一章的模拟方法。这将帮助我们理解新策略相比旧策略究竟有多好。

Before we implement this chapter’s aim-the-darts strategy in the context of this new game, let’s first quickly teach the computer to play Blackjack using last chapter’s simulation approach. This will help us later, when we want to understand how much better the new strategy is as compared to the old one.

让我们使用函数MAKE DECK ()来创建一堆仍可出牌的牌。该函数将当前COMPUTER H ANDDEALER H AND中已有的牌作为输入,并指示庄家是否还有一张隐藏的牌。这个细节非常重要,因为当计算机移动时,它对庄家隐藏的牌一无所知,因此必须模拟游戏,就好像那张牌仍然在牌堆中一样。我们还创建一个NEW C ARD()函数,用于随机抽取一张可用牌;一个MAKE H AND()函数,用于将玩家的牌组合成一手牌,以便我们在各种模拟中研究;以及一个VALUE H AND()函数,用于通过将 K、Q 和 J 转换为 10,将 A 转换为 1 或 11 来报告给定手牌的总值。

Let’s use the function MAKEDECK() to create the pile of cards that are still available to be played. This function takes as input the cards already in the current COMPUTERHAND and DEALERHAND and then also takes an indication of whether one of the dealer’s cards is still hidden. That detail is important because, when the computer moves, the computer does not know anything about the dealer’s hidden card and hence has to simulate the game as if that card is still available in the deck. Let’s also make a NEWCARD() function that randomly draws one of those available cards, a MAKEHAND() function that combines a player’s cards into a hand that we can study in various simulations, and a VALUEHAND() function that reports the total value of a given hand by translating kings, queens, and jacks to 10 and aces to either 1 or 11.

随后,两个函数完成了游戏本身的实现。SCORE GAME ()函数会将 COMPUTER H AND 与 DEALER H AND 进行比较如果计算机赢得则返回+1;如果庄家赢得该轮,则返回 -1;如果平局,则返回 0。PLAY T O17() 函数会将随机添加到DEALER H AND,直到手牌点数达到或超过 17。请记住,为了保证游戏公平,庄家必须这样做。如果允许庄家做出动态决策,那么庄家会查看计算机的手牌并不断抽牌,直到自己赢或爆牌。强制庄家始终PLAY T O17()会剥夺这种优势。

Two more functions then finish implementing the game itself. A SCOREGAME() function compares COMPUTERHAND to DEALERHAND and returns +1 if the computer wins that round, −1 if the dealer wins that round, and 0 for a tie. And PLAYTO17() adds random cards to DEALERHAND until the value of the hand is 17 or more. Remember, the dealer is required to play this way in order to keep the game fair. If the dealer were instead allowed to make dynamic decisions, the dealer would look at the computer’s hand and keep drawing cards until the dealer either won or busted. Forcing the dealer to always PLAYTO17() takes away this advantage.

关键函数是PLAY R ANDOMLY(),它与上一章一样,模拟一局游戏直至完成。设置好不同的变量后,此版本的PLAY R ANDOMLY()首先选择再拿一张牌或停牌。如果函数随机选择停牌,则计算机的回合结束,发牌人通过拿牌来完成其回合,直到其手牌总值达到 17 或更多。但是,如果函数随机选择抽牌,则会向计算机手牌中添加一张新牌,然后重复该循环,函数再次随机选择是抽牌还是停牌。当计算机达到满分 21 点或牌点总和达到取消资格的 22 或更多时,该过程停止。发牌人出牌后,SCORE G AME()会告诉我们计算机或发牌人是否获胜。

The critical function is then PLAYRANDOMLY(), which, as it did last chapter, simulates one game to completion. After setting up the different variables, this version of PLAYRANDOMLY() opens by choosing between either taking another card or standing. If the function randomly chooses to stand, the computer’s turn is over, and the dealer plays out its turn by taking cards until its hand reaches a total value of 17 or more. If the function randomly chooses to draw, however, a new card is added to the computer’s hand and then the cycle repeats, with the function again randomly choosing whether to draw or stand. The process stops when the computer either hits a perfect score of 21 or has cards adding up to a disqualifying score of 22 or more. After the dealer plays, SCOREGAME() tells us whether the computer or dealer won.

然后, FIND B EST M OVE()使用PLAY R ANDOMLY()来模拟游戏,并由此确定计算机的最佳走法。该函数首先评估向COMPUTER H AND添加一张牌的可能性。具体来说,在一半的可用模拟中,该函数会添加一张牌,随机进行剩余的游戏,然后移除添加的牌。在另一半模拟中,不会向计算机的原始手牌中添加任何牌,因此模拟只是为发牌人进行游戏。所有模拟完成后,FIND B EST M OVE () 会查看拿一张牌的平均结果和拒绝其他牌的平均结果。该函数如果拿取一张牌的平均表现更好,则返回TAKE的建议,否则返回 STAND

FINDBESTMOVE() then uses PLAYRANDOMLY() to simulate games and, through that, identify the computer’s best move. The function first evaluates the possibility of adding one card to COMPUTERHAND. Specifically, for half of the available simulations, the function adds one card, randomly plays the rest of the game, then removes that added card. For the other half, no cards are added to the computer’s original hand and so the simulations simply play out the game for the dealer. After all the simulations are done, FINDBESTMOVE() looks at the average outcome associated with taking a card and the average outcome associated with refusing further cards. The function returns the advice of TAKE if taking a card performed better on average, and STAND otherwise.

现在来谈谈本章的改进。

Now for this chapter’s improvements.

到目前为止,我们的代码将二十一点概念化为一系列简单的单层决策树。也就是说,当计算机持有一定数量的牌时,会调用函数FIND B EST M OVE()。该函数首先沿着假设的左分支,取出一张牌并运行模拟。然后,该函数沿着假设的右分支,拒绝任何牌并运行模拟。因此,隐式树只有三个节点:当前手牌、计算机取出一张牌后的手牌以及计算机停牌后的手牌。

Our code thus far conceptualizes Blackjack as a series of simple, single-layer decision trees. That is, the function FINDBESTMOVE() is called at a time when the computer is holding some number of cards. The function first follows a hypothetical left branch, takes one card, and runs its simulations. Then the function follows a hypothetical right branch, refuses any cards, and runs its simulations. The implicit tree thus has just three nodes: the current hand, that hand after the computer takes a card, and that hand after the computer stands.

这种简单性是设计使然。记住,我们最初开始使用随机模拟的原因是为了避免大型、可能难以承受的树的复杂性。对于二十一点来说,这是一件好事。一棵完整的二十一点树最初有十二个分支,一个代表“停牌”,另外十一个代表抽到 1、2、3、4、5、6、7、8、9、10 和 11 点牌的可能性。由此开始,每个节点都需要十二个分支,这样代表两步路径,例如“抽 3 然后抽 9”,或者“抽 J 然后维持现状”。如果我们继续下去,结果会变得一团糟,无法运作,所以我们简化了。但或许我们做得有些过了。

This simplicity is by design. Remember, the reason we started using random simulations in the first place was to avoid the complexity of large, potentially overwhelming trees. And for Blackjack, that’s a good thing. A full Blackjack tree would start with twelve branches, one representing “stand” and eleven representing the possibility of drawing cards with the values 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, and 11, respectively. From there, the resulting nodes would each need twelve further branches, in that way representing two-step paths like “draw a 3 then draw a 9,” or “draw jack then stand pat.” Were we to continue, the result would be an unworkable mess, so we simplified. But maybe we overdid it.

考虑下面显示的树,它比我们的隐式树稍微复杂一些,但仍然非常易于管理。这棵更完整的树上的节点有两种类型。左侧,编号为 2、4、6 等的节点是计算机在拿一张牌后到达的节点。从这些节点,计算机可以选择再拿一张牌或停牌。右侧,编号为 3、5、7 等的节点是计算机拒绝再拿牌的节点。从这些节点开始,计算机没有其他选择,因此这些分支上没有“子节点”。发牌者仍然可以拿牌,但计算机已经完成了。

Consider the tree shown below, which is slightly more complicated than our implicit tree but still extremely manageable. The nodes on this fuller tree are of two types. On the left, the nodes numbered 2, 4, 6, and so on are nodes that the computer reaches after it takes a card. From those, the computer can either take another card or stand. On the right, the nodes numbered 3, 5, 7, and so on are nodes where the computer refuses additional cards. From those, the computer has no further choices to make and hence there are no “children” along these branches. The dealer is still allowed to take cards but the computer is done.

现在假设我们准备使用这棵稍微完整一些的树来运行我们的第一个模拟。假设我们没有任何信息,让我们任意从节点 2 开始模拟游戏。也就是说,让计算机抽一张牌,从而移动到节点 2,然后随机模拟游戏的剩余部分。如果模拟进展顺利,我们可能会认为,考虑到我们目前的手牌,再抽一张牌可能是个不错的选择。如果模拟进展不顺利,我们可能不应该太过评判,但我们可能会开始认为坚持持有是更好的选择。

Now imagine that we are ready to run our first simulation using this slightly more complete tree. Given that we have no information at all, let’s arbitrarily begin by simulating a game at node 2. That is, let’s have the computer draw one card, thus moving to node 2, and then simulate the rest of the game randomly. If that simulation goes well, we might think that, given our current hand, drawing one more card might be a good move. If that simulation goes poorly, we probably shouldn’t be too judgmental, but we might begin to think that holding firm would be the better call.

下一次模拟该去哪里?记住,我们要回答的问题是,到时候计算机是应该拿一张牌,从而移动到节点 2,还是应该原地不动,从而移动到节点 3。考虑到这一点,下一次模拟在节点 3 运行或许更合理,而这个节点我们甚至还没有访问过。如果这次模拟比上次模拟进展顺利,也许我们会倾向于稍微倾向于拒绝额外的牌。如果情况不妙,我们可能会开始认为拿一张牌是更好的选择。

Where should we go with our next simulation? Remember, the question we are trying to answer is whether, when the time comes, the computer should take a card and thereby move to node 2 or stand firm and thus move to node 3. Given that, it might make sense to run our next simulation at node 3, a node we have not even visited yet. If that simulation goes well compared to our prior simulation, maybe we will lean ever so slightly toward refusing additional cards. If it goes poorly, we might begin to think that taking a card is the better call.

好的,那么下一次模拟怎么样?显然,我们可以利用它获取更多关于节点 3 的信息。如果我们认为第一次模拟只是侥幸,我们可能会这么做。但是,如果我们决定不在节点 3 进行模拟,那么在节点 2 再运行一次模拟就很愚蠢了。为什么?因为在节点 4 或节点 5 模拟游戏会给我们更多信息。我们仍然会了解到移动到节点 2 的利弊,因为到达节点 4 和 5 的唯一方法是先移动到节点 2。但我们也会了解到拿一张牌,然后再拿另一张牌(节点 4)或停止(节点 5)的可能性。记住,收集这样的额外信息是我们在本章中尝试瞄准飞镖而不是像上一章那样随机投掷的原因之一。如果事实证明,对于现有的牌,“拿牌后拿牌后停牌”是一种很好的策略,而“拿牌后停牌”则是一场灾难,那么我们希望能够区分这两种方法,即使它们都是从拿牌开始到达节点 2。

Okay, what about the next simulation? Obviously, we could use that to get more information about node 3. We might want to do that if we think the first simulation was a fluke. If we decide not to simulate at node 3, however, it would be silly to run another simulation at node 2. Why? Because simulating a game at either node 4 or node 5 would give us more information. We would still learn something about the pros and cons of moving to node 2 because the only way to reach nodes 4 and 5 is to first move to node 2. But we would also learn something about the possibility of taking a card and then either taking another card (node 4) or stopping (node 5). Remember, gathering extra information like this is one of the reasons we are trying to aim the darts in this chapter rather than just throwing them at random like we did in the prior chapter. If it turns out that take-then-take-then-stand is a great strategy given the cards in play but take-then-stand is a disaster, we want to be able to distinguish those two approaches even though both begin by taking a card to reach node 2.

下图使用从五十次模拟中收集的信息更新了我们的博弈树。在每个节点中,顶部的数字是与该节点相关的SCORE GAME ()结果的总和。因此,例如,如果计算机运行了一个模拟,其中它取了一张牌到达节点 2,取了另一张牌到达节点 4,然后在节点 4 进行的模拟中输掉了游戏,则节点 1、2 和 4 的得分都会分别下降 1 分。如果计算机运行了一个模拟,其中它取了一张牌到达节点 2,拒绝再取任何牌到达节点 5,然后获胜,则节点 1、2 和 5 的得分都会分别增加 1 分。

The next diagram updates our game tree with the information gleaned from fifty simulations. In each node, the top number is the sum of the SCOREGAME() results relevant to that node. So, for example, if the computer ran a simulation where it took a card to reach node 2, took another card to reach node 4, and then lost the game in a simulation played at node 4, nodes 1, 2, and 4 would each see their scores drop by 1 point. If the computer ran a simulation where it took a card to reach node 2, declined to take any further cards and thus reached node 5, and then won, nodes 1, 2, and 5 would each see their scores increase by 1 point.

每个节点底部的数字就是模拟涉及该节点的总次数。在刚才描述的例子中,由于第一条模拟路径,节点 1、2 和 4 的计数会上升,而由于第二条模拟路径,节点 1、2 和 5 的计数也会上升。请注意,访问次数在树向上移动时会故意减少 1。例如,节点 2 本身被访问过一次,之后每次访问节点 4 和 5 时,节点 2 都会被访问。因此,此图中的节点 2 显示有 13 次访问,即使节点 4 和 5 明确只占了其中的 12 次。

The bottom number in each node is then the total number of times a simulation involved that node. In the examples just described, the tallies for nodes 1, 2, and 4 would go up thanks to the first simulated path, and the tallies for nodes 1, 2, and 5 would go up thanks to the second simulated path. Note that the visit numbers are intentionally off by one as you move up the tree. Node 2, for instance, is visited once by itself and then visited in addition every time nodes 4 and 5 are visited. Hence node 2 in this drawing shows 13 visits even though nodes 4 and 5 explicitly account for only 12 of them.

好了,你知道该怎么做了。考虑到这个更复杂的场景,我们应该在哪里进行下一次模拟呢?考虑下面显示的流程图。图表从某个感兴趣的节点开始,该节点最初是节点 1,但之后可能是树中的任何节点。我们首先判断该感兴趣的节点是否是“停牌”节点,例如节点 3、5 和 7。如果是,则只有一种可能的模拟:计算机停牌,庄家至少玩到 17 点的游戏。因此,我们应该运行该模拟,更新该节点及其相关节点的获胜和访问统计数据,然后将流程图应用于节点 1,重新开始整个过程​​。

Okay, you know the drill. Where should we invest our next simulation given this more complicated scenario? Consider the flowchart shown below. The chart starts at some node of interest, which will initially be node 1 but might later be any node in the tree. We first ask whether the node of interest is a “stand” node, like nodes 3, 5, and 7. If so, there is only one simulation possible: a game where the computer stands and the dealer plays to at least 17. We should therefore run that simulation, update the win and visit statistics for that node and its related nodes, and then start the process all over again by applying the flowchart to node 1.

如果被审查的节点不是常设节点,我们继续询问这是否是我们第一次访问该节点。如果是,则我们找到了下一个可用模拟的良好候选节点。我们应该让计算机在这个之前未访问过的节点上模拟一局游戏,更新相关的获胜和访问统计数据,然后(再次)返回节点 1 再次开始该过程。

If the node under review is not a stand node, we continue by asking whether this is our first visit to the node. If so, we have found a good candidate for the next available simulation. We should have the computer simulate a game at this previously unvisited node, update the relevant win and visit statistics, and then (again) go back to node 1 to start the process once more.

继续沿着流程图往下看,我们到达关键的一步:如果所讨论的节点不是站立节点并且至少被访问过一次,则没有理由在此节点上运行模拟。这就是我们关于如何专注于子节点的模拟比专注于其父节点的额外模拟更具信息量的观点。具体来说,如果节点 2 已经被访问过,那么最好在节点 4 或节点 5 上运行下一个模拟,因为这将提供有关节点 2 的信息,然后还会提供有关节点 4 或 5 的信息。同样,如果节点 4 已经被访问过,那么最好在节点 6 或 7 上运行下一个模拟,这样既可以了解节点 4,还可以了解节点 6 或 7。因此,在流程图中的这一点上,由于所考虑的节点不是站立节点并且已经被访问过一次,计算机应该选择该节点的一个下游子节点并重新开始流程图,但是应用于所选子节点。如果该子节点是站立节点,则进行模拟。如果该子节点从未被访问过,则进行模拟。如果该子节点不是站立节点,且已被访问过一次,则选择其子节点之一并重新开始流程图。

Continuing down the flowchart, we reach the critical step: if the node at issue is not a stand node and has been visited at least once, there is no reason to run a simulation at this node. This is our point about how simulations that focus on children can be more informative than additional simulations that focus on their parents. Concretely, if node 2 has already been visited, it is better to run the next simulation at either node 4 or node 5 because that would give information about node 2 and then also give information about node 4 or 5. Similarly, if node 4 has already been visited, it is better to run the next simulation at node 6 or 7, in that way learning about node 4 and in addition learning about node 6 or 7. Thus, at this point in the flowchart, because the node under consideration is not a stand node and has already been visited once, the computer should choose one of the node’s downstream children and start the flowchart anew, but applied to that chosen child. If that child is a stand node, simulate. If that child has never been visited, simulate. If that child is not a stand node and has already been visited once, pick one of its children and start the flowchart again.

那么,如何在两个子节点之间做出选择,就是我们需要克服的最后一个智力障碍。直观地说,选择应该基于两个考虑。首先,我们应该优先选择看起来更有希望的子节点,以便进一步验证这条有希望的路径。例如,如果节点 4 已经被访问过 12 次,并且与 11 次胜利相关联,那么就有充分的理由选择节点 4,并最终在其子节点、孙节点或曾孙节点上运行模拟。这样,计算机就能更多地了解与这个有希望的节点相关的下游模式。其次,有时,推动换句话说,如果一个子节点的访问频率远低于另一个子节点,我们应该优先考虑该子节点及其子节点、孙节点等等,以防早期模拟出现误导。至少,这第二个考虑因素应该强烈偏向一个从未被访问过、因此与其兄弟节点相比完全未知的子节点。

The question of how to choose between two children is then the last intellectual hurdle we need to clear. Intuitively, the choice should be based on two considerations. First, we should favor the child that seems more promising, so that we can further validate the promising path. For example, if node 4 has been visited a dozen times and has been associated with eleven wins, there is a strong argument for picking node 4 and ultimately running a simulation at one of its children, grandchildren, or great-grandchildren. The computer in that way will learn more about the downstream patterns associated with this promising node. Second, and sometimes pushing the other way, if one child has been visited much less frequently than the other, we should favor that child and its children, grandchildren, and so on, just in case the early simulations were misleading. At a minimum, this second consideration should strongly favor a child that has never been visited and hence is a complete unknown compared to its sibling.

我们可以在标准的FIND B EST M OVE()函数中轻松实现所有这些功能,简单得令人难以置信。记住,我们已经编写了一些游戏专用函数,例如MAKE D ECK()函数,用于创建由所有剩余牌组成的牌堆;NEW C ARD() 函数,用于从该牌堆中随机抽取一张新牌;MAKE H AND()函数,用于创建用于模拟的样本牌型;PLAY T O17()函数,用于实现庄家的 17 点或以上策略;PLAY R ANDOMLY()函数,用于运行单次模拟并报告结果;SCORE G AME()函数,用于比较COMPUTER H ANDDEALER H AND,以确定计算机赢、庄家赢还是平局。现在,我们需要做的就是按照流程图所示的模式调用这些函数。

We can implement all this surprisingly easily in our standard FINDBESTMOVE() function. Remember, we have already written game-specific functions like MAKEDECK() to create a deck made up of all still-available cards, NEWCARD() to draw a random new card from that deck, MAKEHAND() to create a sample hand to use in the simulation, PLAYTO17() to implement the dealer’s 17-or-more strategy, PLAYRANDOMLY() to run a single simulation and report back, and SCOREGAME() to compare COMPUTERHAND with DEALERHAND so as to determine whether the computer won, the dealer won, or there was a tie. Now all we need to do is call those functions in the pattern suggested by the flow chart.

首先建立一些直观的变量。让VISITS[1]表示访问节点 1 的次数,VISITS[2]表示访问节点 2 的次数,以此类推。随着新节点的添加,我们会向此列表添加新条目。类似地,让WINS[1]表示与节点 1 关联的当前分数,WINS[2]表示与节点 2 关联的当前分数,以此类推,同样根据需要添加条目。请注意,像WINS[3]这样的变量,与节点 3 关联的每次胜利都会增加 1,与节点 3 关联的每次失败都会减少 1。深入分析后,如果WINS[3]等于 9,则意味着计算机在玩涉及节点 3 的游戏时,迄今为止的胜利次数比失败次数多 9 次。

Start by establishing some intuitive variables. Let VISITS[1] represent the number of visits made to node 1, VISITS[2] represent the number of visits made to node 2, and so on. We will add new entries to this list as we add new nodes over time. Similarly, let WINS[1] represent the current score associated with node 1, WINS[2] represent the current score associated with node 2, and so on, again adding entries as we need them. Note that a variable like WINS[3] will increase by 1 for each win associated with node 3 and decrease by 1 for each loss associated with node 3. Deep into our analysis, then, if WINS[3] is equal to 9, the implication is that the computer has thus far enjoyed nine more wins than losses when playing games that involved node 3.

这样,FIND B EST M OVE()就可以通过将变量CURRENT N ODE设置为关注节点 1 来开始探索。从这里开始,只要还有更多模拟需要运行,代码就应该首先检查CURRENT N ODE是否是站立节点。我们的编号系统使这变得简单,因为所有站立节点都用大于 1 的奇数表示。如果CURRENT N ODE是站立节点,计算机将取出到达该节点所需的任意数量的牌,然后按照流程图所示,模拟一局游戏,更新所有相关的WINSVISITS计数器,然后重置各个变量以进行新的一轮游戏。

With that, FINDBESTMOVE() can start its exploration by setting the variable CURRENTNODE to focus on node 1. From there, as long as there are more simulations to run, the code should first look to see if CURRENTNODE is a stand node. Our numbering system makes that easy because all stand nodes are represented by odd numbers greater than 1. If CURRENTNODE is a stand node, the computer takes however many cards are needed to reach that node and then, as the flowchart suggests, simulates a game, updates all the relevant WINS and VISITS counters, and then resets the various variables for a new pass.

接下来,如果CURRENT N ODE不是站立节点,计算机需要检查这是否是它第一次访问该节点。如果是,流程图表明当前模拟的最佳用途是探索这个从未访问过的节点,具体方法是抽取到达此节点所需的牌,然后从那里随机玩游戏。一旦知道结果,计算机将再次更新VISITSWINS,再次重置变量,然后再次重新启动流程图。这里有一个问题:因为这是第一次访问该节点,所以现在是创建变量来表示该节点的子节点的好时机。我们现在不会对这些变量做任何事情,但这样这些节点就可以每当我们的流程图回到树的这一部分时,我们都会进行探索。因此,在下面的代码中,在运行此模拟之前,我们在VISITSWINS列表中为两个新条目腾出了空间。

Next, if CURRENTNODE is not a stand node, the computer needs to check whether this is its first visit to the node. If so, the flowchart indicates that the best use of the current simulation is to explore this never-before-visited node, specifically by drawing however many cards are needed to reach this node and then randomly playing the game from there. Once the results are known, the computer again updates VISITS and WINS, again resets the variables, and once more starts the flowchart anew. One wrinkle here: because this is the first visit to the node in question, it’s a good time to create variables to account for this node’s children. We will not do anything with those variables at this point, but this way those nodes are ready for exploration whenever our flowchart brings us back to this part of the tree. Thus, in the code below, before running this simulation, we make room for two new entries in the VISITS and WINS lists.

我们最后一段代码必须处理已经访问过CURRENT N ODE 的情况。记住,在这种情况下,我们需要从两个可用的子节点中选择一个,然后重复上述所有分析,但重点关注所选的子节点。我们调用相关函数CHOOSE C HILD()函数,输入与两个孩子相关的WINSVISITS值,并返回相应的孩子“拿”节点或“站”节点的编号作为输出。为了做出选择,该函数将执行我们的两个直观规则:优先选择胜率更高的节点,以便这些更有希望的节点被更频繁地访问;但有时也会选择胜率较低的节点,以防早期模拟结果出现误导。

Our final bit of code must then addresses situations where we have already visited CURRENTNODE. Remember, in those instances, we need to pick one of the two available children and then repeat all of the above analysis but focused on the chosen child. Let’s call the relevant function CHOOSECHILD() and have it take as input the WINS and VISITS associated with both kids and return as output the node number for either the relevant child take node or the relevant child stand node. To make that choice, the function will implement our two intuitive rules: favor nodes that have higher win rates such that those more promising nodes are visited more often, but sometimes choose less promising nodes, just in case the early simulations were misleading.

从那里开始,我们的代码不断循环,直到计算机模拟次数耗尽。当这种情况发生时,就到了决策时间:计算机必须决定是拿牌还是停牌。来自节点 3 的信息提供了部分答案;节点 3 的胜率预测了如果计算机在这一轮中停牌,计算机获胜的可能性有多大。停牌是正在考虑的两个选择之一,因此胜率显然会帮助计算机做出最终决定。但是,计算机应该如何评估它的另一个选择,即拿牌的可能性呢?

Our code from there loops around and around until the computer runs out of simulations. When that happens, it’s decision time: the computer must decide whether to take a card or stand. Information from node 3 provides part of the answer; the win rate at node 3 is a prediction as to how likely it is that the computer will win if, in this round, the computer stands. That is one of the two choices being considered, so that win rate will clearly help the computer make its final determination. But how should the computer evaluate its other option, the possibility of taking a card?

直觉上,你可能会倾向于关注节点 2。然而,节点 2 的胜率很复杂,因为与节点 2 相关的WINSVISITS记录了下游节点 4、5、6、7 及之后的所有胜率和访问次数。因此,即使模拟预测在拿两张牌后停牌(节点 7)会非常有效,节点 2 可能仍然看起来没有吸引力,因为节点 2 将这些好结果与一些可能不太有吸引力的选择,比如只拿一张牌就停止(节点 5)以及再拿三张牌就停止(节点 9)。这是我们在本章开头提到的问题的一个版本:查看拿一张牌后的平均结果是有问题的,因为这个平均值把拿一张牌的结果和其他结果混在一起了,包括拿两张牌、拿三张牌,以及可能带来灾难性的拿十四张牌。

Intuitively, you might be tempted to focus on node 2. The win rate at node 2 is complicated, however, because the WINS and VISITS associated with node 2 capture all of the wins and all of the visits from downstream nodes 4, 5, 6, 7, and beyond. Thus, even if the simulations predict that standing after taking two cards (node 7) will work out fabulously well, node 2 might still look unattractive because node 2 combines those good outcomes with potentially less attractive options like stopping after taking just one card (node 5) and stopping after taking three more cards (node 9). This is a version of the problem we flagged at the start of the chapter: looking at the average result after taking one card is problematic because that average lumps together the take-one-card outcomes with other outcomes including take-two-cards, take-three-cards, and the likely disastrous take-fourteen-cards.

幸运的是,CHOOSE C HILD()化解了困境。它是如何做到的?根据设计,我们的CHOOSE C HILD()函数会不成比例地在最有希望的节点上运行模拟。因此,节点 2 的汇总统计数据并不能公平地反映所有后续节点的情况。相反,胜率较高的节点比胜率较低的节点贡献了更多胜利和访问量。因此,节点 2 的胜率信息量惊人。由于最佳节点的访问频率高于其吸引力较小的节点,因此节点 2 的胜率主要反映了这些更好的结果。

Luckily, CHOOSECHILD() saves the day. How? By design, our CHOOSECHILD() function disproportionately runs simulations at the most promising nodes. Thus, the summary statistics at node 2 are not some even-handed reflection of all the nodes that follow. Instead, nodes with higher win rates contribute significantly more wins and significantly more visits than do nodes with lower win rates. The win rate at node 2 is therefore surprisingly informative. Because the best nodes are visited more often than their less attractive peers, the win rate at node 2 primarily reflects those better outcomes.

因此,节点 2 的胜率可以很好地预测计算机在拿一张牌并采取最佳行动的情况下获胜的可能性,无论是拿一张牌然后停牌,拿一张牌然后再拿另一张牌,还是拿一张牌然后再拿两张、三张甚至四张牌。考虑到这一点,如果节点 2 的胜率高于节点 3 的胜率,计算机就应该拿一张牌,否则就停牌。

The win rate at node 2 is thus a pretty good predictor of the likelihood that the computer will win if it takes one card and from there does whatever is best, whether that turns out to be taking that one card and standing, taking that one card and then later taking another card, or taking that one card and then later taking two, three, or even four more cards. Given that, at this point, the computer should take a card if the win rate at node 2 is higher than the win rate at node 3, and stand otherwise.

CodeLink 提供的代码首先会在屏幕上全屏播放几局示例游戏,这样你就能看到牌并评估计算机的决策。之后,它会建议你再默默地玩 10,000 局。如果你接受,结果会有所不同,但计算机大约会赢 4,300 局,输 4,900 局,剩下的 800 局左右打平。这实际上是一个相当不错的结果,与顶级玩家在大量游戏中获得的结果相当。计算机输多赢少的原因很简单,因为计算机先手。如果计算机爆牌,计算机就输了,即使荷官在出牌后也会爆牌。事实上,这是拉斯维加斯赌场在二十一点上赚钱的主要方式。即使是完美的玩家,输的局数也会比赢的多,因为完美的玩家仍然必须先手。

The code available at the CodeLink starts by playing a few sample games, fully onscreen, so that you can see the cards and evaluate the computer’s decisions. After that, it offers to silently play 10,000 more games. When you accept, the results will vary, but the computer will win roughly 4,300 of them, lose roughly 4,900 of them, and tie the remaining 800 or so games. That is actually a pretty great result, on par with what a top player would experience over a large number of games. The reason the computer loses more games than it wins is simply because the computer goes first. If the computer busts, the computer loses, even if the dealer would also have busted had the dealer played out its hand. This is in fact the main way Las Vegas casinos make money on Blackjack. Even a perfect player will lose more games than they win because a perfect player must still play first.

章节挑战

Chapter Challenge

下表总结了玩家何时应该吃牌或停牌的传统观点,具体情况取决于玩家和庄家手中的牌。例如,当玩家手中没有A牌且牌点数加起来等于13时,表格建议,如果庄家亮出的牌小于7,玩家就停牌,否则就吃牌。相反,当玩家手中有一张A牌和一张6牌时,表格建议玩家每次都吃一张牌。

The table below summarizes the conventional wisdom as to when a player should take versus stand, keyed to the cards held by the player and the cards held by the dealer. When the player holds no aces and has cards that add up to 13, for example, the table suggests that the player stand if the dealer’s revealed card is below 7 but take otherwise. When the player has an ace and a 6, by contrast, the table advises that the player take a card every time.

本章的挑战是修改我们的代码,使计算机在每局游戏中都将我们算法的选择与表格中反映的选择进行比较。是否存在计算机拿牌的情况,即使表格建议停牌?是否存在计算机停牌的情况,即使表格倾向于拿牌?你注意到了什么规律吗?随着模拟次数的增加,会发生什么?

Your challenge for this chapter is to revise our code such that, for every game, the computer compares our algorithm’s choices to the choices reflected in this table. Are there situations where the computer takes a card even though the table suggests that it stand? Are there situations where the computer stands even though the table favors taking a card? Do you notice any patterns? What happens as you allow more and more simulations?

9

9

瞄准他人

Aiming Darts at Others

随机模拟简化了博弈树并简化了博弈树分析。这是它的真正价值,也是我们实施该方法迄今为止的经验。例如,当我们将随机模拟应用于2048 时,我们将一棵原本需要大量节点的树变成了仅使用五个节点的树:一个起始节点加上四个子节点,左/右/上/下各一个。当我们将随机模拟应用于 Blackjack 时,一棵几乎每个节点都有十二个子节点的树变成了节点最多有两个子的树。而且这些更简单的树有效。我们的2048代码与大多数人类玩家一样可靠地达到了目标。我们的 Blackjack 代码甚至可以与拉斯维加斯的职业玩家相媲美。

Random simulation simplifies game trees and streamlines game-tree analysis. That is its true value, and that has been our experience so far as we have implemented the approach. When we applied random simulation to 2048, for example, we transformed a tree that would have required an impossibly large number of nodes into a tree that used just five: a starting node plus four children, one each for left/right/up/down. When we applied random simulation to Blackjack, a tree where nearly every node would have had twelve children became a tree where nodes had at most two. And these simpler trees worked. Our 2048 code reached its target as reliably as most human players would. Our Blackjack code would compete well against even professional players in Vegas.

但对随机模拟的真正考验,既不是像2048这样的单人游戏,也不是像二十一点这样你全程玩,我全程玩的游戏。真正的考验是像井字棋或四子棋这样的回合制游戏,其中我的最佳走法取决于你的最佳反应,而你的最佳反应又取决于我接下来的最佳反应,以此类推。经验告诉我们,这些博弈树很难绘制,评估起来也极其耗时。事实上,在编写四子棋解算器时,我们最终首先通过限制计算机进行分析的深度来简化博弈树,然后,当即使这样也不够好时,通过教会计算机修剪那些在真实游戏中不可能出现的树枝,这已经足够了。但随机模拟是否是修剪和管理双人游戏中过度生长的树木的更好方法?

But the real test for random simulation is neither a one-player game like 2048 nor a you-play-in-full-then-I-play-in-full game like Blackjack. The real test is a turn-based game like tic-tac-toe or Connect Four, where my best move depends on your best response, which in turn depends on my best further response, and so on down the line. We know from experience that those game trees are difficult to sketch and painfully time-consuming to evaluate. Indeed, when we wrote our Connect Four solver, we ended up simplifying the tree first by limiting the depth to which the computer was allowed to pursue its analysis and then, when even that was not good enough, by teaching the computer to prune branches that would never plausibly be played in a real game. But might random simulation be an even better way to trim and manage two-player, overgrown trees?

在第 4 章和第 5 章末尾的挑战中,我们已经编写了四子棋所需的基本游戏功能,包括DROP C HECKER(),允许玩家将棋子放入任何空列;IS F ILLED(),检查棋盘上是否有空格;MAKE B OARD(),生成一个新的样本棋盘,计算机可以在该样本棋盘上进行实验,而不会破坏正在进行的真实游戏。我们还编写了一个SCORE B OARD()函数,为正在进行的游戏分配数字分数,但这里我们可以使用一个更简单的版本,如果计算机赢得提交的棋盘,则返回+1,如果计算机输掉该棋盘,则返回 -1,如果棋盘代表平局,则返回 0 ,如果没有赢家或平局,则返回CONTINUE ,因此计算机需要在评估游戏之前进一步模拟。

In the challenges at the end of chapters 4 and 5, we already wrote the basic gameplay functions we need for Connect Four, including DROPCHECKER(), which allows a player to drop a checker into any empty column; ISFILLED(), which checks whether the board has any empty spaces; and MAKEBOARD(), which generates a new sample board on which the computer can experiment without corrupting the real, in-progress game. We also wrote a SCOREBOARD() function that assigns numeric scores to in-progress games, although here we can use a simpler version that returns +1 if the computer wins the submitted board, −1 if the computer loses that board, 0 if the board represents a tie, and CONTINUE if there is no winner or tie and hence the computer needs to simulate further before evaluating the game.

那么这里有什么新东西呢?随机模拟双人游戏的复杂性意味着,在编写和调试这段代码时,清晰的变量定义至关重要。鉴于此,让我们创建一个NODE数组,并使用它来跟踪我们可能需要的所有信息,包括哪些棋子由哪个玩家在何处落下;各种获胜和访问计数器,类似于我们为二十一点构建的计数器;以及关于树中哪些节点对应于下一个玩家可能做出的响应动作的信息。警告:这些信息量很大,我们需要几页纸来整理。然而,一旦我们整理好,编写实际算法的工作就会进展得很快。简而言之,这个项目的难点不在于实现流程图,而在于仔细跟踪计算机从成千上万次“我做这个,你做那个,我这样回应”的模拟中学习到的知识。

So what’s new here? The complexity of randomly simulating a two-player game means that clear variable definitions will be extremely important when writing and debugging this code. Given that, let’s create a NODE array and use it to keep track of everything we might need, including information about which checkers were dropped by which player and where; various win and visit counters, akin to the ones we built for Blackjack; and also information about which nodes in the tree correspond to the next player’s possible responsive moves. Warning: this is a ton of information and it will take us several pages to organize it. Once we do, however, the work of writing the actual algorithm will progress quickly. Put plainly, the hard part of this project is not implementing the flowchart but instead keeping careful track of what the computer learns from thousands of I-do-this, you-do-that, I-respond-this-way simulations.

从节点编号开始。上一章,对于二十一点,我们为每个节点分配了一个编号,然后将这些编号用于两个相关的目的:它们告诉我们该节点是否可以有子节点,以及它们告诉我们该节点在我们简化的树中的确切位置。具体来说,偶数节点始终是计算机可以从中拿牌或保持原状的节点。并且每个拿牌的子节点的节点编号始终比其父节点的编号大 2。与此同时,在奇数节点,计算机不再接受额外的牌,这些节点的编号始终比其父节点的编号大 3。上一章唯一的例外是第一个节点,我们标记为奇数 1,但被视为具有偶数节点的所有属性。

Start with the node numbers. Last chapter, for Blackjack, we assigned each node a number and then used those numbers for two related purposes: they told us whether the node could have children, and they told us exactly where the node sat in our simplified tree. Specifically, even-numbered nodes were always nodes from which the computer could either take a card or stand pat. And every take-a-card child always had a node number that was two greater than its parent’s number. At odd-numbered nodes, meanwhile, the computer could no longer accept additional cards, and those nodes always had a number that was three greater than their parent’s number. The only exception last chapter was the first node, which we labeled with the odd number 1 but treated as having all the properties of an even-numbered node.

然后,节点与另外两条信息相关联:该节点被访问的次数,以及从计算机的角度来看与该节点相关的净胜负分数。我们将这些信息存储在两个单独的列表中,并使用节点编号作为索引。也就是说,VISITS[5]跟踪节点 5 参与完整模拟的次数,WINS[3]存储在节点 3 获得的净胜次数。

Nodes were then associated with two other pieces of information: the number of times the node had been visited and the net win/loss score associated with that node from the computer’s perspective. We stored this information in two separate lists using the node number as the index. That is, VISITS[5] tracked the number of times node 5 was part of a full simulation, and WINS[3] stored the net number of wins achieved at node 3.

这次,我们将再次对节点进行编号,但我们将放弃奇偶编号方案,而是保留所有必要节点关系的动态列表。因此,虽然每个节点都会被分配一个编号,但每个节点都会被明确标记,其中包含我们需要的信息,包括它在树中的位置以及它在分析中的总体用途。

This time, we will again number the nodes, but we will abandon our odd/even numbering scheme and instead keep a running list of all the necessary node relationships. Thus, while each node will be assigned a number, from there every node will be explicitly labeled with the information we need about its place in the tree and its overall purpose in the analysis.

第一点关键信息是:对于每个节点,我们将记录是计算机(C)还是其对手(R)下了那个将我们带到该节点的棋子。这一点很重要,因为四子棋是回合制游戏,所以有时计算机会选择最符合自身利益的节点,但有时计算机也需要从对手的角度来思考游戏。我们在第四章第一次思考极小极大时也遇到过同样的问题。而这两次的结果都是,我们必须追踪是哪个玩家做出了将我们带到游戏特定时刻的决定。

The first bit of critical information is this: for each node, we will record whether the computer (C) or its rival (R) played the checker that brought us to that node. This is important because Connect Four is a turn-based game, and so sometimes the computer will want to choose the node that best serves its own interest, but sometimes the computer will need to think about the game from its rival’s perspective. We saw this same issue back in chapter 4 when we first thought about minimax. And both times, the upshot is that we have to keep track of which player made the decision that brought us to any specific moment in the game.

我们该如何具体地做到这一点?最终,我们将把每个节点看作是它自己的 11 个条目的列表,这样起始节点的第一个条目将是NODE[0][0],起始节点的第二个条目将是NODE[0][1],第二个节点的第一个条目将是NODE[1][0]。使用该命名约定,如果游戏以计算机的对手占据第 5 列开始,我们可以通过将字母R放入内存中位置NODE[0][0]并将数字 5 放入位置NODE[0][1]来初始化NODE数组。请注意,NODE[0][0]将始终设置为字母R,以指示先前的移动(如果有)是由对手完成的。换句话说,每棵树都应该以对手节点开始,因为我们只在轮到计算机玩时才构建树。接下来的七个节点将代表计算机可能的下一步动作:在第 1 列中放置一个棋子,在第 2 列中放置一个棋子,依此类推,直到第 7 列。

How should we do that concretely? Ultimately we are going to think of each node as being its own list of eleven entries, such that the first entry for the starting node will be NODE[0][0], the second entry for the starting node will be NODE[0][1], and the first entry for the second node will be NODE[1][0]. Using that naming convention, if the game begins with the computer’s rival taking column 5, we can initialize the NODE array by placing the letter R into memory at position NODE[0][0] and putting the number 5 at position NODE[0][1]. Note that NODE[0][0] will always be set to the letter R to indicate that the prior move, if any, was made by the rival. Put differently, every tree should start with a rival node because we build trees only when it is the computer’s turn to play. The next seven nodes will then be nodes that represent the computer’s possible next moves: dropping a checker in column 1, dropping a checker in column 2, and so on, through column 7.

在继续思考我们将存储在NODE数组中的其他信息之前,让我们先暂停一下,介绍一个良好的编程习惯:将单词WHO定义为 0,将单词WHERE定义为 1。为什么要这样做?这样,当我们查看节点 3 时,我们可以通过可读性极强的NODE[3][WHO]来识别掉落相关棋子的玩家,并且可以通过可读性极强的NODE[3][WHERE]来识别玩家选择的列。这将帮助我们编写干净的代码。如果没有标签,我们就必须记住每个列表中的第一个条目是玩家的身份,第二个条目是列号,依此类推,最终会变成令人痛苦的十一个条目。标签使我们的生活更轻松,并降低了在使用NODE数组时犯编程错误的风险。

Before continuing to think about the other information we will store in the NODE array, let’s pause and introduce a good programming practice: let’s define the word WHO to mean 0 and the word WHERE to mean 1. Why would we do that? So that when we are looking at, say, node 3, we can identify the player who dropped the relevant checker as the very readable NODE[3][WHO], and we can identify the column that player chose by using the very readable NODE[3][WHERE]. This will help us write clean code. Without labels, we would have to remember that the first entry in each list is the identity of the player, the second is the column number, and so on, for what will turn out to be a painful eleven entries. Labels make our life easier and reduce the risk that we will make programming mistakes as we use the NODE array.

接下来是一个熟悉的条目,用于跟踪节点访问。该条目的作用与Blackjack 中的VISITS列表相同。遵循我们之前提到的WHO / WHERE标记实践,我们可以将VISITS定义为 2,然后使用诸如NODE[3][VISITS]之类的变量来跟踪节点 3 的访问次数,使用NODE[7][VISITS]来跟踪节点 7 的访问次数。

Next up is a familiar entry, one that tracks node visits. This entry will serve the same purpose that our VISITS list served in Blackjack. Sticking with our helpful WHO/WHERE labeling practice, we can define VISITS to mean 2 and then use variables like NODE[3][VISITS] to keep track of the number of times node 3 was visited and NODE[7][VISITS] to keep track of the number of times that node 7 was visited.

这里唯一的新问题在于更新这些访问计数的过程。在二十一点中,我们的奇数/偶数编号方案使更新变得简单。访问任何奇数节点始终意味着更新该节点的访问计数,然后更新小三个数字的节点的访问计数,之后更新所有偶数节点的访问计数,一直到起始点。访问任何偶数节点意味着更新该节点以及其前的所有偶数节点。相比之下,这里每个节点后面最多可能有七个其他节点,每列一个,因此没有可用于更新的简单数学模式。为了解决这一挑战,我们将使用PATH变量来跟踪我们是如何到达当前节点的。也就是说,如果计算机正在模拟从节点 0 到节点 5、到节点 14、到节点 28 的游戏,PATH变量将按顺序列出数字 0、5、14 和 28。当需要更新访问计数时,PATH将告诉计算机要更改哪些计数,并且它会这样做而不需要任何巧妙的数学或分析。

The only new wrinkle here comes in the process of updating these visit counts. In Blackjack, our odd/even numbering scheme made updating easy. A visit to any odd-numbered node always meant updating the visit count for that node, for the node three numbers smaller, and, after that, for every even node all the way back to the start. A visit to any even-numbered node meant updating that node and also updating every prior even node. Here, by contrast, each node is followed by up to seven possible further nodes, one for each column, and thus there is no simple mathematical pattern we can use for updates. To address that challenge, we will use a PATH variable to keep track of how we reached the node at hand. That is, if the computer is simulating a game that travels from node 0 to node 5, to node 14, to node 28, the PATH variable will list, in order, the numbers 0, 5, 14, and 28. When it comes time to update visit counts, PATH will tell the computer which tallies to change, and it will do so without requiring any clever math or analysis.

每个节点所需的第四个信息是获胜计数,它将取代我们在二十一点游戏中独立的WINS列表,并在此处标记为WINS。更新此计数很容易;我们将再次使用PATH变量。除此之外,WINS 的工作原理与以前完全相同。每当计算机获胜时,我们将加 1,每当计算机输时,我们将减 1,并且对于以平局结束的任何模拟,总数保持不变。唯一的新复杂性在于,当我们存储用于 Connect Four 游戏的计数时,我们将从玩家的角度来编写它。节点。例如,如果NODE[3][WHO]表示是计算机下了棋才把我们带到了节点 3,那么如果计算机在这个节点上赢的局数多于输的局数,则NODE[3][WINS]为正,否则为负。同样,如果NODE[9][WHO]表示是计算机的对手下了棋才把我们带到了节点 9,那么如果对手在这个节点上赢的局数多于输的局数,则NODE[9][WINS]为正,否则为负。

The fourth bit of information we need for each node is the win count, which replaces our standalone WINS list from Blackjack and will be labeled WINS here. Updating this count is easy; once more, we will use the PATH variable. Beyond that, WINS works exactly as it did before. We will add 1 whenever the computer wins, subtract 1 whenever the computer loses, and leave the total unchanged for any simulation that ends in a tie. The only new complexity is that when we store the count for use in Connect Four, we will write it from the perspective of the player who brought us to this node. So, for example, if NODE[3][WHO] indicates that it was the computer who played the checker that brought us to node 3, then NODE[3][WINS] will be positive if the computer has won more games at this node than it has lost, but negative otherwise. Similarly, if NODE[9][WHO] indicates that it was the computer’s rival who played the checker that brought us to node 9, then NODE[9][WINS] will be positive if the rival has won more games at this node, but negative otherwise.

树中的每个节点最终都会关联 11 条信息。本章最大的挑战是设计一个变量来组织所有这些信息。

There are ultimately going to be eleven pieces of information associated with each node in the tree. The big challenge this chapter is designing a variable to organize them all.

我们需要在NODE数组中构建的最后一点信息是关于如何找到与每个特定节点关联的子节点的信息。这将需要七个条目,每个条目对应一个下一位玩家可能会放下一个棋子。也就是说,第一个条目应该是如果下一位玩家将棋子放在第一列,我们将到达的子节点的节点号。第二个条目应该是如果下一位玩家将棋子放在第二列,我们将到达的子节点的节点号。以此类推,所有七列,因此有七个可能的子节点。

The final bit of information we need to build into the NODE array is information about where to find the children associated with each specific node. This will require seven entries, one for each of the columns in which the next player might drop a checker. That is, the first entry should be the node number for the child we will reach if the next player places its checker in the first column. The second entry should be the node number for the child we will reach if the next player places its checker in the second column. And so on, for all seven columns and hence seven possible children.

请注意,对于第一个节点,其子节点的编号始终是直观的:节点号 1 对应第 1 列,节点号 2 对应第 2 列,直到节点号 7 对应第 7 列。然而,在树的更深层,其模式是无法预测的。节点 9 的第一个子节点可能是节点号 15,或者 85,甚至 127。为什么?因为计算机会根据需要添加节点,而不是从一开始就添加所有节点。例如,假设第一个模拟测试涉及节点 2 的游戏,因此使用数字 8 到 14 将节点 2 的子节点添加到数组中。假设下一个模拟测试涉及节点 6 的游戏,因此使用数字 15 到 21 将节点 6 的子节点添加到数组中,之后的模拟测试节点 7 并使用数字 22 到 28 作为其子节点。如果下一个模拟在节点 3 进行,则节点 3 的子节点将被分配数字 29 到 35,即使我们可能预期节点 1 使用数字 8 到 14,节点 2 使用数字 15 到 21,节点 3 使用数字 22 到 28。根本无法预测所有这些,因为实际数字取决于每个节点何时(如果有的话)添加到树中。因此,我们必须在进行过程中生成并存储这些数字。

Note that, for the first node, the children’s numbers will always be the intuitive ones: node number 1 for column 1, node number 2 for column 2, up to node number 7 for column 7. Deeper in the tree, however, the pattern is impossible to predict. The first child for node 9 might be node number 15, or 85, or even 127. Why? Because the computer will be adding nodes as needed rather than adding them all right from the start. Suppose, for example, that the first simulation tests out a game involving node 2 and hence adds node 2’s children to the array using numbers 8 through 14. Suppose that the next simulation tests out a game involving node 6 and so adds node 6’s children to the array at numbers 15 through 21, and the simulation after that tests node 7 and uses for its children numbers 22 through 28. If the next simulation plays out at node 3, node 3’s children will be assigned numbers 29 through 35 even though we might have expected node 1 to use numbers 8 to 14, node 2 to use numbers 15 to 21, and node 3 to use numbers 22 through 28. There is simply no way to predict all this because the actual numbers depend on when, if ever, each node is added to the tree. We thus have to generate and store these numbers as we go.

幸运的是,所有这些都可以在NODE数组中轻松处理。具体方法如下:每当计算机向树中添加新的子节点时,计算机都会使用接下来的七个可用节点编号(无论它们是什么),然后将这些编号记录到我们的数组中。例如,当计算机将与节点 9 关联的子节点添加到树中时,如果可用节点编号为 211 到 217,则第一个子节点将被分配给节点 211,第二个子节点将被分配给节点 212,最后一个子节点将被分配给节点 217。这些赋值语句可以写成非常易读的形式,例如NODE[9][CHILD1] = 211NODE[9][CHILD2] = 212

Luckily, all this is easy for us to handle in our NODE array. Here’s how. Whenever the computer adds new children to the tree, the computer will use the next seven available node numbers, whatever they are, and then record those numbers in our array. For instance, when the computer adds the children associated with node 9 to the tree, if the available node numbers are numbers 211 through 217, the first child will be assigned to node 211, the second child will be assigned to node 212, and the last child will be assigned to node 217. These assignments can be written in very readable form as, for example, NODE[9][CHILD1] = 211 and NODE[9][CHILD2] = 212.

解释得有点多,所以让我们把它们整合起来,深入了解一下NODE数组的实际作用。下图展示了计算机在某个正在进行的棋盘上运行了 1000 次模拟后,在真实比赛中NODE数组的情况。NODE [0][WHO]NODE[0][WHERE]的位置被正确标记,表明在之前的一步中,计算机的对手在第 6 列丢了一个棋子。NODE [0][VISITS]中的计数表明计算机以此为起点进行了 1,000 次模拟,NODE[0][WINS]中的计数告诉我们,计算机总体运行良好;在这些模拟中,计算机的对手输掉的棋局比赢的多 252 局。事实上,计算机似乎有一步妙棋:在第四列落下一枚棋子,计算机在 549 次尝试后获得了 194 的净正分。对于计算机来说,这是一个相当不错的胜率,这也解释了为什么计算机访问节点 4 的次数如此之多,而访问节点 7 的次数仅为 21 次,净负分仅为 5 分。

That is a lot of explanation, so let’s pull it together and take an under-the-hood peek at the NODE array in action. The chart below shows the NODE array from a real game after the computer ran 1,000 simulations on some in-progress board. Positions NODE[0][WHO] and NODE[0][WHERE] are correctly marked to show that, in the prior move, the computer’s rival dropped a checker in column 6. The count in NODE[0][VISITS] shows that the computer ran 1,000 simulations involving this starting point, and the count in NODE[0][WINS] tells us that things went well for the computer overall; across these simulations, the computer’s rival lost 252 more games than it won. In fact, it looks like the computer has a great move to make: dropping a checker in the fourth column gave the computer a net positive score of 194 after 549 tries. That’s a pretty good win rate for the computer, which also explains why the computer visited node 4 so often as compared to, say, node 7, which shows a net loss of 5 and as a result was visited only 21 times.

节点数组的快照仅显示了前八个节点:起始节点,后面跟着代表计算机现在必须考虑的七个走法的七个节点。数组中更深层的节点分别代表计算机对手的后续走法以及计算机的后续走法。

This snapshot of the node array shows only the first eight nodes: the starting node followed by the seven nodes that represent the seven moves the computer must now consider. Deeper in the array are nodes that represent later moves for the computer’s rival and later moves for the computer, too.

现在,我们的NODE变量已准备就绪,可以构建一些函数,这些函数与我们之前为更简单的二十一点程序构建的函数类似。首先从PLAY R ANDOMLY()开始。上次,该函数将两位玩家的牌作为输入,并在进行一些游戏相关的清理后,随机选择“ TAKE” (接牌)或“ STAND” (停牌) ,直到计算机爆牌或其牌加起来达到完美的 21 点。之后,轮到发牌人出牌,该函数会报告在这场随机游戏中,计算机获胜、发牌人获胜还是平局。新的PLAY R ANDOMLY()可以遵循相同的流程。该函数将当前棋盘以及指示最近下棋的是计算机还是对手的指示作为输入。从这里开始,玩家轮流在任何可用的列中随机下棋,以此模拟一场真正的随机游戏。该函数最后调用SCORE B OARD() ,如果计算机获胜则返回结果+1,如果对手获胜则返回 -1,如果玩家平局则返回 0。

With our NODE variable now ready for use, we can build functions that mirror the ones we already built for our simpler Blackjack program. Start with PLAYRANDOMLY(). Last time, that function took as inputs the two players’ hands and, after doing some game-specific cleanup, randomly chose whether to TAKE or STAND until the computer either busted or its hand added up to a perfect 21. After that, the dealer would take its turn and the function would report back whether, in that one random game, the computer won, the dealer won, or there was a tie. The new PLAYRANDOMLY() can follow this same flow. The function takes as inputs the current gameboard and an indication of whether the most recent checker was played by the computer or its rival. From there, the players take turns, randomly dropping checkers in any available column and through that simulating a truly random game. The function ends by calling SCOREBOARD() and returning the result +1 if the computer won, −1 if the rival won, and 0 if the players tied.

接下来是关键时刻:是时候分配那些PLAY R ANDOMLY()模拟了。对于二十一点,我们开发了一个流程图来展示计算机应该如何做出这个决定。在这里,相同的流程图仍然有效,唯一的变化是游戏本身语言所要求的变化。也就是说,我们的二十一点流程图首先询问所讨论的节点是否是站立节点。如果是,我们模拟该游戏。如果不是,我们提出下一个问题,询问这是否是计算机第一次访问该特定节点。如果是,我们进行模拟。如果不是,我们选择该节点的一个子节点并再次开始该过程。如下图所示,我们将对四子棋遵循这种精确的模式,但将站立节点的概念替换为导致胜、负或平局的类似节点概念。

And that takes us to the big moment: it’s time to allocate those PLAYRANDOMLY() simulations. For Blackjack, we developed a flow chart that showed how the computer should make this decision. Here, that same flowchart still works, the only changes being those required by the language of the game itself. That is, our Blackjack flowchart began by asking whether the node at issue was a stand node. If so, we simulated the game. If not, we asked our next question, inquiring as to whether this was the computer’s first visit to that specific node. If it was, we simulated. If not, we chose one of the node’s children and started the process again. As the below diagram shows, we will follow this exact pattern for Connect Four but replace the concept of a stand node with the similar concept of a node that results in a win, loss, or tie.

稍微解释一下,我们的二十一点分析首先要问的是,所考虑的节点是否是站立节点,因为站立节点既没有子节点,也不需要计算机进行任何进一步的决策。在本质上,停牌节点是树枝的末端;因此,当计算机到达停牌节点时,只需判断是自己赢还是庄家赢。对于四子棋来说,类似的问题是,该节点是赢、输还是平局。如果是这样,我们就到了四子棋树枝的末端,计算机只需通过更新相关的“赢”“到”计数器来记录结果即可。

Unpacking all that a bit, our Blackjack analysis began by asking whether the node under consideration was a stand node because stand nodes neither have children nor require any further decision-making by the computer. In essence, a stand node is the end of a tree branch; thus, when the computer reaches a stand node, the computer simply needs to figure out whether it or the dealer wins. For Connect Four, the analogous question is whether the node at issue is a win, a loss, or a tie. If so, we are at the end of a Connect Four tree branch, and the computer need only record that outcome by updating the relevant WINS and VISITS counters.

我们可以在常规的FIND B EST M OVE()函数中实现该过程。每次模拟时,我们首先将相关的节点编号添加到PATH中,以便计算机能够跟踪在本次游戏模拟过程中访问了哪些节点。然后,我们按照图表进行操作:构建游戏板,进行计分,如果结果是死路,则更新各个计数并重新开始。

We can implement that process inside our normal FINDBESTMOVE() function. For each simulation, we start by adding the relevant node number to the PATH so that the computer can keep track of which nodes were visited during this simulation of the game. We then follow the chart: we build the gameboard, score it, and, if the result is a dead end, we update the various tallies and start again.

继续我们的流程图,在《二十一点》中,如果节点不是站立节点,我们接下来会询问这是否是我们第一次访问该节点。如果是,我们使用PLAY R ANDOMLY()获取该从未访问过的节点的信息,然后回到树的顶部重新开始。在这里,我们将执行完全相同的操作,尽管使用了更复杂的NODE变量,并且还要仔细注意这一轮是属于计算机还是它的对手。下面的代码摘录展示了结果,并附有注释标记了关键步骤。

Continuing in our flowchart, in Blackjack, if the node was not a stand node, we next asked whether this was our first visit to the node. If so, we used PLAYRANDOMLY() to get information about that never-before-visited node and then went back to the top of the tree to start again. We will do the exact same thing here, albeit with our more complicated NODE variable and also with careful attention to whether this turn belongs to the computer or its rival. The code excerpt below shows the result, with comments labeling those key steps.

最后,在 Blackjack 中,对于至少被访问过一次的节点,我们会触发CHOOSE C HILD()来选择哪个子节点现在应该优先。事后看来,这个函数相对简单;对于 Blackjack,我们只需要比较两个节点:一个可用的停留节点和一个可用的获取节点。这次,CHOOSE C HILD()更加复杂,因为它必须从在最多七个可能的孩子中,每列一个,仅当相关列已经填满时才忽略节点。不过,我们的总体分析保持不变。CHOOSE C HILD()函数仍然应该主要青睐那些似乎能让做出选择的玩家获胜的节点。同时,该函数有时应该选择不太有希望的节点,以防这些节点在更丰富地模拟后看起来更好。我们将在下面详细介绍CHOOSE C HILD () ;现在,我们先规定我们有一个名为CHOOSE C HILD()的函数,它通过比较所有相关的获胜率和所有相关的访问次数来选择下一个节点。

Lastly, in Blackjack, for a node that was already visited at least once, we would trigger CHOOSECHILD() to pick which child should now take priority. That function was in hindsight relatively simple; for Blackjack, we had to compare just two nodes, an available stay node and an available take node. This time, CHOOSECHILD() is more complex in that it has to pick from among up to seven possible children, one for each column, ignoring a node only if the relevant column is already filled to capacity. Still, our big-picture analysis remains the same. The CHOOSECHILD() function should still primarily favor nodes that seem to give a win to the player who is making the choice. At the same time, the function should sometimes pick less promising nodes, just in case those nodes look better after they are simulated more richly. We will say more about CHOOSECHILD() below; for now, let’s just stipulate that we have a function called CHOOSECHILD() and it picks the next node by comparing all the relevant win rates and all the relevant visit counts.

我们之前没有关注CHOOSE C HILD()函数的细节,部分原因是即使CHOOSE C HILD()存在很大缺陷,我们的二十一点代码也能表现良好。事实上,你可能在上一章实验过这个函数,发现无论你做什么,几百次模拟总能得出一个相当不错的“接受或不接受”的决策。相比之下,四子棋游戏要复杂得多,因此这次CHOOSE C HILD()背后的细节至关重要。简而言之,如果计算机只允许进行几千次模拟,那么它需要非常有效地利用这些模拟,才能有机会赢得四子棋游戏。

We have not previously focused on the details of the CHOOSECHILD() function, in part because our Blackjack code performs well even when CHOOSECHILD() is significantly imperfect. Indeed, perhaps you experimented with the function last chapter and found that, whatever you did, a few hundred simulations would always lead to a reasonably good take-or-stand decision. Connect Four, by contrast, is a much more complicated game, and thus the details behind CHOOSECHILD() matter this time around. Simply put, if the computer is allowed only a few thousand simulations, it needs to use them particularly well if it is going to have any chance of winning a game of Connect Four.

让我们仔细看看CHOOSE C HILD()函数,它已经包含在上面的 CodeLink 中。该函数首先识别哪个子节点实际上是有效的选项。有些子节点无效,因为它们与已填满的列相关联。代码会将这些节点排除在考虑范围之外。然而,对于可用的子节点,循环会填充三个列表:KID N UMBERS存储这些合理子节点的节点编号,KID V ISITS存储每个子节点被访问的次数,KID W INS存储与每个相同节点相关联的净获胜次数。

So let’s take a closer look at the CHOOSECHILD() function, which is already included in the above CodeLink. The function starts by identifying which children nodes are in fact valid options. Some children are not valid because they are associated with columns that are already filled to the brim. The code eliminates those nodes from consideration. For available children, however, the loop populates three lists: KIDNUMBERS stores the node numbers for these plausible children, KIDVISITS stores the number of times each of these children have been visited, and KIDWINS stores the net number of wins associated with each of these same nodes.

为了比较节点,CHOOSE C HILD()现在会检查是否有任何争用节点的访问次数为零。如果是,该函数将选择一个未访问的节点并运行模拟。这里的直觉很熟悉,即每个子节点都应该至少访问一次,然后其他子节点才能获得第二次访问。然而,如果每个子节点都至少访问了一次,计算机就会继续前进,并为每个子节点分配分数,分数一部分基于该子节点的胜率,一部分基于该子节点相对于其兄弟节点已被访问的频率。胜率就是该节点的净胜次数与总访问次数的比率。唯一棘手的部分是,胜率必须从正确的玩家角度来计算。也就是说,如果考虑的七个节点都是计算机对手的可能走法,那么更具吸引力的胜率就是更频繁地让对手获胜的胜率。相反,如果考虑的七个节点都是计算机的可能走法,那么更具吸引力的胜率就是更频繁地让计算机获胜的胜率。幸运的是,由于我们定义变量的方式,这些比较很容易实现:在我们的代码中,对于每个节点,NODE[#][WINS]始终从NODE[#][WHO]的角度进行存储。因此,为了实现这部分分析,计算机只需寻找最大的正比率,而无需考虑计算机获胜得分为+1而对手获胜得分为 -1 的事实。最后,请注意,在我的代码中,这些比率随后乘以 10。这样做的目的是使胜率在整个CHOOSE C HILD()计算中具有额外的重要性。

To compare the nodes, CHOOSECHILD() now checks to see if any of the nodes in contention have zero visits. If so, the function picks an unvisited node and runs a simulation. The intuition here is the familiar one, namely that every child should have at least one visit before some other child is awarded a second visit. If every child has at least one visit, however, the computer moves on, assigning scores to each child based in part on that child’s win rate and in part on the frequency with which that child has already been visited as compared to its siblings. The win rate is just that node’s ratio of net wins to total visits. The only tricky part is that the win rate must be calculated from the right player’s perspective. That is, if the seven nodes under consideration are all possible moves for the computer’s rival, a more attractive win rate is one that more often gives the rival the win. By contrast, if the seven nodes under consideration are all possible moves for the computer, a more attractive win rate is a rate that more often gives the computer the win. Luckily, these comparisons are easy to accomplish thanks to the way we have defined our variables: in our code, for every node, NODE[#][WINS] is always stored from the perspective of NODE[#][WHO]. Thus, to implement this part of its analysis, the computer need only look for the biggest positive ratio, with no need to account for the fact that computer wins are scored as +1 while rival wins are scored as −1. Lastly, note that in my code these ratios are then multiplied by 10. The idea is to give win rates extra importance in the overall CHOOSECHILD() calculation.

节点得分的另一个组成部分是衡量该节点相对于其他节点已被访问的频率。同样,目标是如果与其他相关子节点相比,该子节点的访问次数很少,则提高该节点的得分。实现这种比较的方法有很多,其中一些非常复杂。在示例代码中,我首先将访问父节点的次数除以被评分的特定子节点的访问次数。子节点被访问的次数越少,这个数字就越大。然后我对这个结果取平方根,这有点古怪,但有两个目的:一是防止比例过大;二是将结果进行缩放,例如,一个节点在100次访问中只获得了10次,得分为3.16;而访问了40次、50次或60次的节点,得分都较小,且非常接近,分别为1.58、1.41和1.29。这有利于我们实现将注意力集中在最被忽视的年轻人身上的直观目标。

The other component of a node’s score is a measure of how often that node has already been visited as compared to its peers. Again, the goal is to increase a node’s score if, in comparison to the other relevant children, that child has been visited rarely. There are many ways to implement this comparison, and some of them are enormously complex. In the sample code, I start by dividing the number of visits to the parent by the number of visits to the specific child being scored. The fewer times the child has been visited, the bigger this number. I then take the square root of that result, which is quirky but serves two purposes: it keeps the ratios from becoming too large, and it scales the results such that, for example, a node that has enjoyed just 10 out of 100 available visits earns a score of 3.16 but nodes that have enjoyed 40, 50, or 60 of those visits all earn smaller and extremely similar scores of 1.58, 1.41, and 1.29, respectively. This facilitates our intuitive goal of focusing attention on the most neglected youngsters.

诚然,有些CHOOSE C HILD()实现的性能会优于我的,其中一些依赖于非常复杂的数学函数。此外,我开发一个好的CHOOSE C HILD()的过程也包含了一些反复试验。例如,我将胜率乘以 10 的原因是,当我在真实游戏中测试我的代码时,乘以 2 和乘以 20 的性能都更差。不过,就我们的目的而言,此处的目标并非找到最佳实现,而是确保我们的评分方法至少能够大致平衡我们对研究有希望的节点的兴趣和我们对继续探索即使看似不理想的节点的兴趣。从这一点来看,是的,我们仍然有很多发挥和实验的空间。

Admittedly, there are CHOOSECHILD() implementations that would outperform mine, including some that rely on very sophisticated mathematical functions. Moreover, my process of developing a good CHOOSECHILD() involved some trial and error. The reason I multiply the win ratios by 10, for instance, is that multiplying by 2 and by 20 both performed worse when I tested my code in real games. For our purposes, though, the goal here is not to find the optimal implementation but instead to make sure that our scoring approach is at least roughly balancing our interest in studying promising nodes against our competing interest in continuing to explore even the seemingly undesirable ones. From there, yes, there is still lots of room to play and experiment.

章节挑战

Chapter Challenge

我们编写了两个版本的四子棋。一个版本通过限制深度并修剪无效分支来应对树的复杂性。另一个版本通过使用随机模拟来应对树的复杂性。本章的挑战是编写代码,让一台采用深度和剪枝算法的计算机与一台采用随机模拟算法的计算机对战。在面对面的比赛中,哪一种方法会占据主导地位?当你改变深度和剪枝算法允许的深度,或者当你增加在要求战略模拟计算机做出选择之前可以容忍的模拟次数时,会发生什么?更好的方法是将这两个概念混合在一起吗?也许在显式树难以处理的走法中使用随机模拟,但在游戏后期过渡到深度和剪枝算法,以便更仔细地评估潜在的胜负局面?让这些不同的机器对战一千盘。哪个版本表现最佳?而你,作为人类玩家,能击败它们中的任何一个吗?

We have written two versions of Connect Four. One confronts tree complexity by limiting depth and then pruning dud branches. The other responds to tree complexity by using random simulation. Your challenge this chapter is to write code that pits those two approaches against one another, with a depth-and-pruning computer playing against a random-simulation competitor. Does one approach dominate in head-to-head competition? What happens as you vary the depth you allow for depth-and-pruning, or as you increase the number of simulations you tolerate before requiring the strategic-simulation computer to make a choice? Would an even better approach be to mix the two concepts, maybe using random simulation for moves where an explicit tree would otherwise be a painful slog but transitioning to a depth-and-pruning approach late in the game, so as to more carefully evaluate potential game-winning and game-losing scenarios? Pit these various machines against one another for a thousand games. Which version performs best? And can you, a human player, beat any of them?

追踪与训练

Tracking and Training

10

10

石头,布……布

Rock, Paper . . . Paper

在石头剪刀布游戏中,每位玩家秘密地将一只手摆成石头、纸或剪刀的形状。当信号发出时,玩家们会亮出他们的选择,并应用一条直观的规则:布盖石头,石头砸剪刀,剪刀剪布,其他情况则平局。你肯定玩过这个游戏,并且可能把它想象成抛一枚公平的硬币或掷一个没有重量的骰子。也就是说,你可能把它当作一个完全公平、完全随机的、可以做出无偏见选择的游戏。但事实真是如此吗?

In rock-paper-scissors, each player secretly positions a hand in the shape of either a rock, a piece of paper, or a pair of scissors. When a signal is given, players reveal their choices, and an intuitive rule is applied: paper covers rock, rock smashes scissors, scissors cut paper, and everything else is simply a tie. You have surely played this game from time to time and have probably thought of it as something like the flip of a fair coin or the roll of an unweighted die. That is, you have probably treated the game as if it is a completely fair, completely random way to make an unbiased choice. But is it really?

假设一台计算机正在与人类玩家对弈,并且该人类玩家选择的走法使得在每一轮中,玩家选择石头、剪刀或布的概率均等。在这个例子中,无论计算机如何选择,从长远来看,计算机都会赢得三分之一的比赛,输掉三分之一的比赛,并打平三分之一的比赛。假设计算机每次都选择石头。计算机将赢得人类玩家选择剪刀的每一场比赛,输掉人类玩家选择布的每一场比赛,并且当人类玩家也选择石头时,计算机将打平。想象一下,计算机始终从石头到布再到剪刀循环。或者计算机总是选择布。最终结果是一样的。如果人类玩家真的完全随机地选择走法,那么计算机的命运就已注定。无论计算机如何选择,在大量游戏中,计算机最终都会赢得三分之一的比赛,输掉三分之一的比赛,其余为平局。石头剪刀布是与真正随机的对手进行的,是一种公平公正的玩法。

Suppose a computer is playing the game against a human player, and that human player is choosing moves such that, in every round, the player is equally likely to pick rock, scissors, or paper. In that example, no matter what the computer does, in the long run the computer will win one-third of the games, lose one-third of the games, and tie one-third of the games. Imagine the computer picks rock every time. The computer will win every game where the person picks scissors, lose every game where the person picks paper, and tie whenever the person also selects rock. Imagine instead that the computer cycles from rock to paper to scissors consistently. Or the computer always picks paper. The final result is the same. If the human player really is choosing moves completely at random, the computer’s fate is sealed. No matter what the computer does, over a large number of games, the computer will end up winning one-third of the games, losing one-third, and tying for the rest. Rock-paper-scissors is, against a truly random opponent, a fair and unbiased venture.

如果人类玩家随机下棋,随着时间的推移,计算机最终将赢得三分之一的比赛。即使计算机总是只走一步棋,比如石头,情况也是如此。

If the human player moves randomly, over time the computer will end up winning one-third of the games. This is true even if the computer always plays just one move, such as rock.

但现在想象一下,计算机正在与一个人类对手对弈,而这个对手,嗯,确实是人类。也许它不小心比应该的更频繁地偏爱石头。也许在出完纸牌之后,它不太愿意连续出第二张纸牌。也许它在输了之后会改变选择,但在赢了之后会坚持下去。也许它会选择当时最少出手的选项。在任何这些情况下,计算机都会突然获得显著的优势,因为它可以详尽地追踪对手的走法,识别任何怪癖或特质,然后利用这些弱点赢得更多比赛。

But now imagine that the computer is playing against a human opponent who is, well, human. Maybe they accidentally favor rock just a little more often than they should. Maybe after playing paper they are a tad too reluctant to play paper a second time, back-to-back. Maybe they tend to change their pick after a loss but stick with it after a win. Maybe they choose whatever is, at the time, their least-played option thus far. In any of these instances, the computer suddenly gains a meaningful advantage because the computer can exhaustively track its rival’s moves, identify any quirks or idiosyncrasies, and then exploit those weaknesses to win more games.

例如,假设计算机跟踪人类玩家的选择,并开始检测到人类玩家略微倾向于选择剪刀。实际上,为了进行量化分析,想象一下,人类玩家选择石头、剪刀、布的概率不是大约33%,而是30%、30%、40%。了解了这一点,计算机就可以做出更倾向于选择石头的回应。毕竟,如果计算机选择石头,它将赢得40%的比赛,而只输30%,这两项数据都比真正随机的基准数据有所提升,因为在真正随机的基准数据中,胜负概率均为33%。

For instance, suppose the computer keeps track of the human player’s choices and begins to detect a slight tendency to pick scissors. Indeed, just to put numbers on it, imagine that instead of playing rock, paper, and scissors each roughly 33 percent of the time, the human player picks rock 30 percent of the time, paper 30 percent of the time, and scissors 40 percent of the time. Knowing this, the computer can favor rock in response. After all, if the computer plays rock, it will win 40 percent of the games and lose only 30 percent, both improvements over the truly random baseline where winning and losing are equally likely at approximately 33 percent each.

更复杂但同样重要的是,想象一下,计算机检测到它的人类对手不愿连续两次做出相同的举动。知道这一点后,计算机可以在人类玩家出完一局后立即出剪刀,从而在任何一局游戏中获得优势。石头,因为根据这个假设,人类玩家不太可能在下一轮再次出石头。相反,如果人类玩家刚刚出布,那么石头就成了计算机的下一个有吸引力的选择。如果人类玩家刚刚出剪刀,那么计算机的下一步就应该是布。

More complicated but in the same spirit, imagine that the computer detects that its human adversary is reluctant to make the same move twice in a row. Knowing that, the computer can gain an advantage by playing scissors in any game immediately after a game where the human player played rock because, on this assumption, the human is not likely to play rock again in that next round. If the human player just played paper, by contrast, rock becomes the computer’s attractive next choice. And if the human player just played scissors, the computer’s next move should be paper.

下面的代码实现了这种方法。函数RESPOND R ELUCTANT()的输入是用户选择石头、剪刀还是布,如果用户确实不愿意连续出石头、剪刀或布,则返回计算机应该采取的出手方式。例如,如果调用 RESPOND R ELUCTANT()时传入一个表示石头的参数,则该函数建议计算机在下一轮出剪刀。

The code below implements this approach. The function RESPONDRELUCTANT() takes as input an indication of whether the user chose rock, paper, or scissors and returns as output the move the computer should make if this user really is reluctant to repeat moves back-to-back. For instance, if RESPONDRELUCTANT() is called with an argument that indicates rock, the function suggests that the computer play scissors in the next round.

然后,一个相关函数IS R ELUCTANT()会使用来自该玩家PLAYER H ISTORY的数据来评估这位人类玩家。它会逐个循环PLAYER H ISTORY中的历史走法,评估如果计算机使用RESPOND R ELUCTANT()来选择走法进行响应,会发生什么情况,并总结结果。如果从历史上看,这种走法能让计算机胜负更多,该函数会报告:是的,这个人似乎是一位不情愿的玩家,因此容易受到这种策略反应的影响。如果不是,该函数会拒绝该假设,实际上是在建议计算机不要假设这个人会成为这种怪癖的受害者。

A related function, ISRELUCTANT(), then uses data from this player’s PLAYERHISTORY to evaluate this human player. It loops through PLAYERHISTORY one historical move at a time, evaluates what would have happened had the computer used RESPONDRELUCTANT() to pick its moves in response, and summarizes the results. If playing this way would have given the computer more wins than losses historically, the function reports back that, yes, this human seems to be a reluctant player and hence vulnerable to this strategic response. If not, the function rejects the hypothesis, in essence advising the computer not to assume that this particular human will fall victim to this particular quirk.

请注意,支持函数CALCULATE O UTCOME()未显示,但它为计算机赢得的每一轮奖励 1 分,为计算机输掉的每一轮扣除 1 分,并为平局返回 0。

Note that the supporting function CALCULATEOUTCOME() is not shown, but it awards 1 point for every round the computer wins, deducts 1 point for every round the computer loses, and returns 0 for ties.

IS R ELUCTANT()可识别那些不愿意重复走棋的人类玩家。然后可以添加其他函数来解决其他潜在的人为限制。例如,可以使用类似的IS B ORING()函数来评估玩家是否明显偏向某一走棋动作而非其他所有动作。或者,可以使用IS S ORE L OSER()函数测试玩家在遭遇失败后是否特别有可能改变走棋动作。此外,所有这些函数都可以返回OUTCOME的数值,而不是返回TRUEFALSE ,以便比较各种可能的方法,让计算机选择与历史数据最匹配的响应。另一项改进可能是强调最近的游戏玩法而非之前的回合,以防人类玩家随着时间的推移有意或无意地改变倾向。

ISRELUCTANT() identifies human players who are reluctant to allow repetition in their moves. Additional functions can then be added to address other potential human limitations. For instance, a comparable ISBORING() function might be used to evaluate whether a player noticeably favors one move over all others. Or an ISSORELOSER() function can test whether a player is particularly likely to change moves after suffering a loss. Moreover, rather than returning TRUE or FALSE, all of these functions can return the numeric value of OUTCOME so that the various possible approaches can be compared, allowing the computer to choose the response that best matches the historical data. Another improvement might be to emphasize recent gameplay over older rounds, in case the human player intentionally or inadvertently changes tendencies over time.

到目前为止,我们的代码只会查找程序员预料到的缺陷。例如,由于我们知道人类玩家可能不愿意连续重复走子,所以我们的代码会查看玩家的历史记录,看看这位特定玩家是否犯了预测到的错误。同样,由于我们知道人类玩家可能偏爱冰壶,或者在输掉比赛后特别容易改变走子,所以我们也能编写代码来查找这些特定的问题。而且确实有效。但更好的方法是减少对自身预测人类缺陷的能力的依赖,更多地依赖数据自然揭示的模式。

So far, our code looks only for imperfections that we, as programmers, anticipate. For instance, because we know that a human player might be reluctant to repeat moves back-to-back, our code looks through a player’s history to see if this specific player made that predicted mistake. Similarly, because we know that a human player might disproportionately favor rock, or might be particularly likely to change moves after suffering a loss, we were able to write code to look for those specific hiccups, too. And that works. But a better approach would be to rely less on our ability to anticipate human imperfections and more on whatever patterns the data naturally reveal.

如何做到?下面是一个“玩家历史”示例,展示了 12 局游戏的序列。这位玩家一开始出纸牌,然后出石头,再出纸牌,依此类推,如图所示。我们的挑战一如既往:我们希望帮助计算机预测这位玩家的下一步行动。

How? Below is a sample PLAYERHISTORY showing a sequence of twelve games. This particular player started with paper, then played rock, then paper, and so on, as shown. Our challenge is the same as always: we want to help the computer predict this player’s next move.

以这种形式可视化历史,很难看出这位玩家的倾向和失误。所以,让我们将信息转换成更有用的形式。这个人出石头四次。其中两轮之后,这个人出布。再两轮之后,这个人出剪刀。然而,有趣的是,这个人从未连续出石头。

Visualizing the history in this format, it is hard to say much about this player’s tendencies and hiccups. So let’s put the information into a slightly more useful form. This person played rock four times. After two of those rounds, this person played paper. After two of those rounds, this person played scissors. Interestingly, however, not once did this person play rock back-to-back.

现在我们开始看到一个规律。从历史上看,如果这个人出石头,那么他下一步出布的概率是一半,出剪刀的概率是一半,而且永远不会出石头。如果这个规律成立,那么下次这位玩家出石头时,计算机应该(在之后的游戏中)回应剪刀,因为这样计算机获胜的概率是50%,平局的概率也是50%,而输的概率几乎为零。这些数字战胜了纯随机方法,因为纯随机方法会让计算机拥有相同的胜、输或平的概率。

Now we begin to see a pattern. Historically, if this person played rock, their next move was to play paper half of the time, scissors half of the time, and rock never. If that pattern holds, the next time this player picks rock, the computer should respond (in the game after that) with scissors because that would give the computer a 50 percent chance of a win, a 50 percent chance of a tie, and seemingly zero chance of a loss. Those numbers beat the purely random approach, which would have given the computer an equal chance of winning, losing, or tying.

我们可以为布和剪刀分别创建类似的摘要,这些摘要同样可以用来进行预测。例如,如果这位玩家出布,我们可以合理地预测,他有三分之一的概率会出石头,有三分之二的概率会出石头。出剪刀。因此,计算机应该在该玩家出布之后,在该轮出石头。相反,如果该玩家选剪刀,那么下一轮的数字显示,石头的概率是4分之3,剪刀的概率是4分之1,布的概率是4分之0,所以计算机应该在下一轮出布,希望获得更高的胜率。

We can create similar, separate summaries for paper and scissors, and those summaries can likewise be used to make predictions. If this player plays paper, for instance, we might reasonably predict that they have a one-third chance of following up with rock and a two-thirds chance of following up with scissors. The computer should thus play rock in the round after this person plays paper. If this player picks scissors, by contrast, the numbers for the move after suggest a 3 in 4 chance for rock, 1 in 4 for scissors, and 0 in 4 for paper, so the computer should play paper in that next round, hoping for the high-odds win.

为了跟踪和更新这些信息,我们可以使用三个简单的列表。定义ROCK H ISTORY为一个列表,其中第一个条目跟踪玩家选择石头的次数,第二、三、四项分别跟踪玩家随后出石头、剪刀、布的次数。类似地,列表PAPER H ISTORYSCISSORS H ISTORY存储了初始出剪刀和布的计数。在下一页的代码中,函数COUNT()遍历PLAYER H ISTORY并创建以下三个特定于移动的列表:ROCK H ISTORYPAPER H ISTORYSCISSORS H ISTORY

To track and update this information, we can use three simple lists. Define ROCKHISTORY to be a list where the first entry tracks the number of times the player chose rock and the second, third, and fourth entries track the number of times the player followed by playing rock, paper, and scissors, respectively. The lists PAPERHISTORY and SCISSORSHISTORY similarly store the relevant counts for initial moves of paper and scissors. In the code on the next page, the function COUNT() traverses PLAYERHISTORY and creates these three move-specific lists: ROCKHISTORY, PAPERHISTORY, and SCISSORSHISTORY.

这些列表假设只需查看玩家最近的举动就能做出准确的预测。例如,列表“石头历史”基于玩家上一步选择了石头这一事实进行预测。列表“布历史”和“剪刀历史”同样基于玩家之前选择布或剪刀的举动来组织信息。这种方法在一定程度上有效,但如果计算机能够检查更复杂的因果模式,将会更有帮助。

Those lists assume that good predictions can be made simply by looking at the player’s most recent move. For instance, the list ROCKHISTORY makes predictions based on the fact that, last move, the player chose rock. The lists PAPERHISTORY and SCISSORSHISTORY similarly organize information based on the single prior move of paper or scissors. That approach will work to some degree but it would be even more helpful if the computer could check for more complicated patterns of cause and effect.

为此,下图展示了更丰富的游戏历史记录,这次不仅追踪人类玩家的动作,还追踪玩家最终是赢了、输了还是打平。

To that end, the chart below shows a richer game history, this time tracking not only the human player’s moves but also whether the player won, lost, or tied that game as a result.

同样,在这种格式下,很难得出任何结论。但我们可以基于这些信息,并基于以下两条信息创建可视化图表:玩家选择石头、剪刀或布,以及游戏结果。下图显示了其中四幅图表,所有九幅图表都可以轻松归纳以下列表:石头获胜石头失败、石头平时以及剪刀获胜

Again, in that format, it is hard to draw any conclusions. But take that information and create a visualization based on two pieces of information: the player’s choice of rock, paper, or scissors, and the outcome of the game. Four of the resulting graphics are shown below, and all nine can easily be summarized into lists like WHENROCKWON, WHENROCKLOST, WHENROCKTIED, and WHENSCISSORSWON.

这些图表显示了玩家在石头赢、石头平局、布输和剪刀输后的下一步行动。对于剩余的每个选项,例如剪刀赢或石头输,也可以制作类似的图表。

These charts show the player’s next move after winning with rock, tying with rock, losing with paper, and losing with scissors. Similar charts can be made for each of the remaining options, such as winning with scissors or losing with rock.

这些详细的列表为计算机提供了一种结构化的方式来检测模式。例如,如果人类玩家出石头,输了,接下来又出剪刀,计算机就会更新“石头输了列表以考虑这一条额外的信息:至少这一次,石头输了导致玩家在下一轮选择剪刀。每当轮到计算机出牌时,计算机就可以利用这些信息预测人类的下一步动作,并选择最佳应对方案。

These detailed lists give the computer a structured way to detect patterns. For example, if the human player played rock, lost, and next played scissors, the computer would update the list WHENROCKLOST to account for this one additional piece of information: this time, at least, a loss with rock led the player to choose scissors in the next round. Whenever it is the computer’s turn to play, the computer can use this information to predict the human’s next move and choose the optimal response.

章节挑战

Chapter Challenge

在本章中,我们利用历史来预测未来。例如,我们追踪了一位人类玩家在出纸牌后出石头的历史倾向,并利用这些信息预测了这位玩家在出纸牌后在未来的比赛中可能会采取的行动。但详细的历史记录让我们能够做一些更有趣的事情:我们不仅可以预测,还可以模仿。

In this chapter, we used history to predict the future. For example, we tracked a human player’s historical tendency to play rock after having just played paper, and we used that information to predict what that same player might do in a future game after playing paper. But detailed histories allow us to do something even more interesting: instead of predicting, we can imitate.

比如,想象一下,构建历史变量来捕捉泰勒·斯威夫特在使用某些词语后立即使用其他词语的倾向。再想象一下,构建另一组历史记录,类似地追踪莎士比亚几部戏剧中的词汇选择。利用本章探讨的技术,我们能否将这些历史记录组合在一起,并让计算机生成几句,怎么说呢,斯威夫特式的散文?

Imagine, for example, building history variables that capture Taylor Swift’s tendency to use certain words right after using other words. And imagine building another set of histories that similarly track word choice in a few Shakespearean plays. Using the very techniques explored in this chapter, couldn’t we put those histories together and ask the computer to generate a few sentences of, how shall we say it, Swiftearean prose?

朋友们,罗马人,同胞们,准备好摆脱困境吧!使用链接中的入门代码,虽然能实现一些核心功能,但你还可以输入自己喜欢的泰勒·斯威夫特、碧昂丝或埃米纳姆的歌词。没错,这是迈向其他语言模仿技术(包括 ChatGPT)的第一步。

Friends, Romans, countrymen, prepare to shake it off, using the linked starter code to help with some of the core functions but filling in your own favorite Taylor Swift, Beyoncé, or Eminem lyrics. And yes, this is the very first step toward thinking about other language imitation technologies, including ChatGPT.

11

11

黑匣子

Black Boxes

悉尼·哈里斯在《纽约客》上创作了一幅著名的漫画,描绘了一位数学家站在黑板前,写下复杂的公式来证明某个重要的理论命题。黑板中间有一块空白,在一片变量和符号的海洋中,这位数学家用小巧但工整的字体写下了“奇迹发生了”这句话。这幅漫画的标题引用了这位数学家同事的一句话:“我认为你在第二步应该说得更清楚一些。”

A famous New Yorker cartoon by Sydney Harris depicts a mathematician standing at a chalkboard writing complicated formulas to prove some important theoretical proposition. There is an empty space in the middle of the board, and, in that space, surrounded by a sea of variables and notation, the mathematician has written in small but neat letters the phrase “then a miracle occurs.” The caption for the panel is a quote from the mathematician’s colleague: “I think you should be more explicit here in step two.”

在本书的整个过程中,我们始终遵循着漫画的建议。例如,当我们第一次教计算机玩数独、井字游戏和四子棋时,我们使用决策树来演示相关循环和条件语句的内部工作原理。后来,当我们学习利用随机模拟时,我们绘制了流程图来捕捉各种权衡和比较。我们从不依赖奇迹。我们总是凭直觉和解释来解释为什么某个策略或算法是合理的。

Throughout this book, we have consistently heeded the cartoon’s advice. When we were first teaching the computer to play sudoku, tic-tac-toe, and Connect Four, for example, we used decision trees to demonstrate the inner workings of the relevant loops and conditionals. Later, when we were learning to harness random simulation, we drew flowcharts to capture the various tradeoffs and comparisons. We never relied on miracles. We always had intuitions and explanations for why a given strategy or algorithm makes sense.

但当今计算机科学的前沿包含一些真正黑箱化的策略,即使是最老练的程序员也不知道计算机究竟用什么逻辑来解决手头的问题。在这种情况下,程序员知道计算机使用数学关系将输入与输出联系起来。而程序员我们知道这些数学关系源自提供给机器并由机器进行分析的训练数据。但究竟是哪些线索驱动着计算机的决策?究竟是什么解释了这些隐藏在幕后的数学计算?答案并非奇迹,而是一个黑匣子:一个由计算机自行计算出来的解决方案策略,程序员既无法提前提出,也无法事后完整地表达出来。

But the cutting edge of computer science today includes strategies that are truly black boxes in the sense that even the most sophisticated coders do not know exactly what logic the computer is using to solve the problem at hand. In these cases, the programmer knows that the computer is using mathematical relationships to link inputs to outputs. And the programmer knows that those mathematical relationships derive from training data provided to, and then analyzed by, the machine. But exactly which clues power the computer’s decisions? Exactly what justifies the math happening under the hood? The answer is not a miracle but a black box: a solution strategy that the computer itself works out, and one that the programmer neither suggests ahead of time nor can fully articulate afterward.

在本章中,我们将构建我们自己的。

And in this chapter, we are going to build our own.

再次考虑石头剪刀布游戏。这次,我们将要求计算机帮助我们区分随机出牌的人类玩家和那些不愿连续两次重复相同动作的玩家。正如我们在上一章所学到的,如果我们能够区分这两类玩家,我们将在游戏中获得显著优势。例如,如果我们知道我们正在与一位刚出过石头、不愿重复出牌的玩家对战,那么在下一轮中,我们可以优先选择剪刀,剪刀这一招对布尤其有效,与剪刀打平,只有对现在不太可能出石头的玩家才是灾难。相比之下,如果我们的这位不愿重复出牌的玩家出剪刀,我们同样知道在下一轮中优先选择布;如果是布,我们就会倾向于选择石头。

Consider again the game rock-paper-scissors. This time, we are going to ask the computer to help us distinguish human players who are playing randomly from those who are reluctant to repeat the same move twice in a row. As we learned last chapter, if we can distinguish between these two types of players, we gain a significant advantage in the game. For instance, if we know we are playing a reluctant repeater who has just played rock, in the next round we can favor scissors, a move that is particularly good against paper, ties with scissors, and is only a disaster against the now-unlikely-to-be-played rock. If our reluctant repeater played scissors, by contrast, we would similarly know to favor paper in the next round; if paper, we’d lean toward rock.

上次我们应对这项挑战时,作为程序员,我们发挥了主导作用。我们首先设计了像IS R ELUCTANT()IS B ORING()这样的特定函数,后来又设计了像WHEN R OCK W ONWHEN S CISSORS L OST这样的定制数据结构。这次,我们将让计算机自己解决问题。我们唯一的贡献就是为计算机提供每种球员类型的一些样本作为训练数据,并创建一组灵活的变量和关系,让计算机可以用来试验自己的数学方法。

When we tackled this challenge last time, we as programmers very much led the charge, first by crafting specific functions like ISRELUCTANT() and ISBORING(), and later by designing tailored data structures like WHENROCKWON and WHENSCISSORSLOST. This time, we are instead going to let the computer figure things out by itself. Our only contributions will be giving the computer some examples of each player type as training data and creating a flexible set of variables and relationships that the computer can use to experiment with its own mathematical approach.

让我们从这些训练样本开始。我们的最终目标是让计算机能够在几轮游戏中监控人类玩家的行为,然后自信地判断该玩家是随机出牌还是勉强出牌。为了让计算机做好这项任务的准备,我们编写两个函数来创建一些训练数据:一个是MAKE R ANDOM()函数,它会生成一个包含九个真正随机动作的序列;另一个是MAKE R ELUCTANT()函数,它会生成一个包含九个动作的序列,其中重复性很少。为了简单起见,我们用数字 1 代表石头、2 代表布、3 代表剪刀来表示这些动作;并且,对于这些虚构的游戏历史,我们还添加第十个条目来指示结果是否模式是随机生成的(0)或者是在阻力约束下生成的(1)。

Let’s start with those training examples. Our ultimate goal here is for the computer to be able to monitor a human player over a few rounds of the game and then determine, with confidence, whether that player plays randomly or reluctantly. To ready the computer for this task, let’s write two functions to create some training data: a MAKERANDOM() function that generates a sequence of nine truly random moves, and a MAKERELUCTANT() function that generates a sequence of nine moves in which repetition is rare. To keep things simple, let’s represent those moves using the number 1 for rock, 2 for paper, and 3 for scissors; and, for each of these fictional game histories, let’s also add a tenth entry to indicate whether the resulting pattern was generated randomly (0) or instead generated under a reluctance constraint (1).

这就是我们生成训练数据的方式。接下来,我们需要引入一个真正的人类玩家。对于训练集中的每个样本,我们只是生成了九步棋的虚假历史记录。因此,对于我们希望评估的真实玩家,我们也应该收集九步棋的历史记录。由此,我们希望对这个特定人类玩家随机玩游戏的可能性进行估计。鉴于我们使用 0 来表示训练数据中的随机模式,使用 1 来表示不情愿,我们将该输出定义为也在 0 到 1 范围内,越接近 0 的数字表示人类玩家可能是随机的,越接近 1 的数字表示人类玩家可能是不情愿的。添加黑匣子后,我们的程序结构看起来类似于下图所示的网络,其中图左侧的九场游戏要么是来自训练数据的九场样本游戏,要么是来自人类玩家的九场真实游戏;有一个黑匣子来处理这些输入;然后,在右侧,计算机对这九场游戏是否代表随机或不情愿的行为做出最佳猜测。

That’s how we’ll generate training data. Next, we need to introduce a real human player. For each sample in our training set, we just generated a fake history of nine moves. For the real player we hope to evaluate, we should therefore likewise gather a history of nine moves. From there, we are hoping to generate an estimate as to how likely it is that this specific human player is playing randomly. Given that we used a 0 to indicate random patterns in the training data and a 1 to indicate reluctance, let’s define that output such that it also ranges from 0 to 1, with numbers closer to 0 indicating that the human player is probably random and numbers closer to 1 indicating that the human player is probably reluctant. Adding the black box, our program structure looks something like the network depicted below, where the nine games on the left of the diagram are either nine sample games from the training data or the nine real games from the human player; there is a black box to process those inputs; and then, on the right, the computer produces its best guess as to whether those nine games represent random or reluctant behavior.

那么黑匣子里到底是什么呢?我们不妨假设一下,黑匣子里的计算机会以两种方式处理可用的信息:步骤。在第一步中,计算机将以其认为合适的任何组合考虑这九场比赛,并得出一些初步结论。然后,在第二步中,计算机将利用从第一轮分析中学到的知识,得出进一步、更明确的结论。如果我们将第一轮分析标记为 A1、A2 等,将第二组推论标记为 B1、B2 等,我们可以将黑匣子描绘成一个网络,其中每场比赛都会影响五个第一级“A”结论;每个 A 结论又会影响五个第二级“B”结论;然后 B 结论结合起来确定最终输出,这又是计算机对这位玩家是随机出牌还是勉强出牌的预测。

So what’s inside the black box? Somewhat arbitrarily, let’s imagine that inside the box the computer will process the available information in two steps. In the first step, the computer will consider the nine games in whatever combinations the computer thinks appropriate and draw some initial conclusions. Then, in the second step, the computer will take whatever it learned from that first round of analysis and draw further, sharper conclusions. If we label the first-round analyses A1, A2, and so on, and we label the second set of inferences as B1, B2, and so on, we can sketch the black box as a network where each game influences, say, five first-level “A” conclusions; each A conclusion in turn influences five second-level “B” conclusions; and the B conclusions then combine to determine the final output, which again is the computer’s prediction as to whether this player plays randomly or reluctantly.

需要明确的是,分析分为两个阶段并没有什么神奇之处,每个阶段有五个中间结论也没有什么神奇之处。事实上,当你运行代码时,你可能会发现,更多的阶段或更少的中间结论可以产生更好或更快的结果。不过,现在我们先任意构建一个具有以下通用结构的网络:一组五个 A 级第一结论,每个结论都基于一组相关的九场比赛;一组五个 B 级第二结论,每个结论都基于这五个 A 级结果;以及一个基于这些 B 级值的输出。

Just to be clear, there is nothing magical about having two stages to the analysis, and nothing magical about having five intermediate conclusions per stage. Indeed, when you play with the code, you might discover that additional stages, or fewer intermediate conclusions, yield better or faster results. For now, however, let’s arbitrarily commit to a network of this general structure: a set of five A-level first conclusions, each based on some relevant group of nine games; a set of five B-level second conclusions, each based on the five A-level results; and one output, based on those B-level values.

建立了这些联系后,我们现在可以更精确地找出哪些信息应该对哪个中间结论至关重要。以第一局游戏为例。也许玩家在第一局游戏中的选择对位置 A2 的中间决策极其重要,对位置 A4 的中间决策极其重要,而对中间位置 A1、A3 和 A5 所代表的结论几乎无关。用粗线表示高影响力,用细线表示低影响力,这样我们就得到了一个如下图所示的网络,其中第一局游戏与 A2 和 A4 有很强的联系,但与 A1、A3 和 A5 的联系却很微弱。

With those connections established, we can now figure out more precisely which information should matter to which intermediate conclusion. Focus on that first game as an example. Perhaps the player’s choice in that first game will be deemed incredibly important to the intermediate decision at position A2, incredibly important to the intermediate decision at position A4, and then almost irrelevant to the conclusions represented by intermediate positions A1, A3, and A5. Representing high influence with thick lines and low influence with thin lines, that would give us a network like the one shown on the page that follows, where game 1 has strong connections to A2 and A4 but only faint connections to A1, A3, and A5.

该图重点关注第一场比赛及其对第一组初步结论的潜在影响。粗线表示第一场比赛对连接节点的价值具有显著影响。细线表示两者之间的联系不太明显。

This diagram focuses on the first game and its potential influence on the first set of tentative conclusions. Thick lines indicate a relationship where the first game has a significant influence on the connected node’s value. Thin lines indicate a less meaningful connection between the two.

所有其他博弈和节点都可以根据其自身独特的模式进行类似的加权。或许第二个博弈会在节点 A2 和 A5 所代表的任何分析中得到强调,或许第三个博弈会在节点 A4 和 A5 所代表的中间结论中发挥重要作用。或许 A3 和 A4 反过来会在 B3 的决策中发挥重要作用,而 A5 会对节点 B2 和 B5 产生不成比例的影响。如果我们继续使用线的粗细来衡量一个节点对下游下一个节点的影响,那么这意味着我们可以想象出几乎无限数量的合理网络,包括但不限于下一页顶部图表中所示的四个网络。

All the other games and nodes can similarly be weighted according to their own unique patterns. Perhaps the second game will be emphasized in whatever analysis is represented by nodes A2 and A5, and maybe the third game will have a strong role in the intermediate conclusions represented by nodes A4 and A5. Maybe A3 and A4 will in turn play big roles in the decision at B3, while A5 will have disproportionate impact on nodes B2 and B5. If we continue to use line thickness as a proxy for a node’s influence on the next node downstream, the implication is that we can imagine an almost infinite number of plausible networks, including but in no way limited to the four depicted in the graphic at the top of the next page.

那么……我们该选择哪个网络呢?我们是否应该告诉计算机,第一个游戏应该被允许对中间决策A3所代表的内容产生重大影响?我们是否应该让中间决策A4对下一级决策B1至关重要,但与下一级决策B3几乎无关?第四个游戏,决策A2,与决策B3、B4和B5之间的关系又该如何呢?像A3或B4这样的中间决策究竟代表什么?简而言之,我们如何知道应该在这么多网络中构建哪一个?

So . . . which network do we choose? Should we tell the computer that the first game ought to be allowed to heavily influence whatever is represented by intermediate decision A3? Should we make intermediate decision A4 critical to next-level decision B1 but almost irrelevant to next-level decision B3? What about the relationships between the fourth game, decision A2, and decisions B3, B4, and B5? And what does an intermediate decision like A3 or B4 really stand for anyway? In short, how do we know which of these many networks to actually build?

答案是我们不知道,这没关系,因为选择权不在我们手里。这是一个黑匣子。一旦我们创建了框架,计算机需要测试各种选项,并决定哪一组“权重”或厚度最适合训练数据。任何最接近准确分类样本游戏的权重都将成为计算机用来分析真实玩家的权重。人类程序员在决策过程中没有任何作用。在创建了足以代表网络的变量后,程序员现在只需坐下来,让计算机转动各种旋钮、拨动各种开关,寻找能够最准确地处理所有可用训练数据的模式。

The answer is that we do not know, and that’s fine because the choice is not ours to make. This is the black box. Once we create the framework, it is up to the computer to test various options and decide which set of “weights” or thicknesses best fits the training data. Whatever weights come closest to accurately classifying the sample games will become the weights the computer uses to analyze real players. The human programmer has no role in the decision. Having created variables sufficient to represent the network, the programmer now sits back and allows the computer to turn the various knobs and flick the various switches, searching for the pattern that most accurately handles all the available training data.

图中显示了可以使用我们的通用框架(包括九个输入、五个一级节点、五个二级节点和一个输出)构建的四个可能的网络。

Shown are four possible networks that could be built using our general framework of nine inputs, five first-level nodes, five second-level nodes, and one output.

让我们使用数组TEST D ATA来存储我们的训练数据,其中TEST D ATA[1]是一组 9 场游戏加上一个标签,该标签指示该玩家是随机的还是勉强的,TEST D ATA[2]是另一组 9 场游戏加上它的标签,TEST D ATA[1][1]是第一个玩家在第一场比赛中做出的动作,TEST D ATA[2][1]是第二个玩家在第一场比赛中做出的动作,TEST D ATA[2][10]是分配给第二个玩家的标签。计算机需要逐一针对这些示例测试其网络,因此当计算机运行测试时,我们将审查的样本存储在一个可以称为GAME D ATA 的列表中。在任何给定时刻, GAME D ATA[1]是某个特定玩家的第一步动作,GAME D ATA[2]是同一个玩家的第二步动作, GAME D ATA[9]是同一个玩家的第九步动作, GAME D ATA[10]保存该玩家的标签,如果该玩家的历史记录是由 MAKER ANDOM()生成的,则标签同样为 0,如果该玩家的历史记录是由 MAKER ELUCTANT()生成的,则标签为 1 。

Let’s use the array TESTDATA to store our training data, where TESTDATA[1] is one set of nine games plus a label that indicates whether that player was random or reluctant, TESTDATA[2] is another set of nine games plus its label, TESTDATA[1][1] is the move the first player made in that player’s first game, TESTDATA[2][1] is the move the second player made in the second player’s first game, and TESTDATA[2][10] is the label assigned to that second player. The computer will need to test its network against each of these examples one by one, so when the computer is running its tests, let’s store the sample under review in a list we can call GAMEDATA. At any given moment, then, GAMEDATA[1] is one particular player’s first move, GAMEDATA[2] is that same player’s second move, GAMEDATA[9] is that same player’s ninth move, and GAMEDATA[10] holds that player’s label, which again is a 0 if that player’s history was generated by MAKERANDOM() and a 1 if that player’s history was generated by MAKERELUCTANT().

继续我们的变量定义,给定样本中的每个游戏都连接到 A1,正如我们在图中看到的,这意味着有 9 条线进入 A1 节点,每个游戏一条。因此,我们可以将变量命名为 A 1并将其设置为跟踪这 9 个条目,其中 A 1[1]表示节点 A1 和游戏 1 之间的线路权重的数字, A 1[2]表示节点 A1 和游戏 2 之间的线路权重的数字。然后,我们可以将第十个条目偷偷放入该列表中,稍后我们将使用它来存储节点 A1 的计算出的最终值。也就是说,节点 A1 最终将表示为一个数字,该数字反映了GAME D ATA,该数字由存储在 A 1[1]、A 1[2]等到 A 1[9] 中的值加权。我们将计算出的值存储在 A 1[10]中,以便将其用作下游节点 B1、B2、B3、B4 和 B5 的输入。

Continuing with our variable definitions, each game in a given sample connects to A1, which, as we see in the diagrams, means that there are nine lines that come into the A1 node, one from each game. We can thus name a variable A1 and set it up to track those nine entries, with A1[1] being a numeric representation of the weight of the line between node A1 and game 1, and A1[2] being a numeric representation of the weight of the line between node A1 and game 2. We can then sneak a tenth entry into that list, which we will use later to store a calculated, final value for node A1. That is, node A1 will ultimately be represented as a number that reflects the GAMEDATA as weighted by the values stored in A1[1], A1[2], and so on, through A1[9]. We will store that calculated value in A1[10] so that we can use it as an input for downstream nodes B1, B2, B3, B4, and B5.

当然,我们刚才关于节点 A1 所说的一切也适用于节点 A2 到 A5。也就是说,节点 A2 还需要一个相应的变量 A 2来跟踪其九个传入权重并存储其最终计算值。节点 A3、A4 和 A5 也是如此。然后,B 节点需要类似的列表,在狂野的创造力中,我们将其称为 B 1、B 2、B 3、B 4和 B 5。这些变量稍微简单一些,因为它们只需要五个传入权重(一个将 B 节点连接到 A1,一个将 B 节点连接到 A2,依此类推,通过 A5)加上一个计算值,该计算值将根据这些权重构建,应用于存储在 A 1[10]、 A 2[10]、 A 3[10]、 A 4[10]和 A 5[10] 中的当前值。列表变量OUTPUT是我们需要的最后一个变量。它应该存储连接 B 节点和网络最终输出节点的五个权重,然后还存储该节点自身计算出的最终值,该值将基于各个 B 节点值和各个 B 到输出节点的权重。将所有这些正式写出来,我们得到了一些非常清晰的结果,如下所示。

Of course, everything we just said about node A1 also applies to nodes A2 through A5. That is, node A2 also needs a corresponding variable A2 that can track its nine incoming weights and store its final calculated value. So do nodes A3, A4, and A5. And then the B nodes need similar lists, which in a surge of wild creativity we will call B1, B2, B3, B4, and B5. These variables are a little simpler because they need only five incoming weights (one connecting the B node to A1, one connecting the B node to A2, and so on, through A5) plus one calculated value that will be built from those weights as applied to the then-current values stored in A1[10], A2[10], A3[10], A4[10], and A5[10]. The list variable OUTPUT is the last variable we need. It should store the five weights that connect the B nodes to the network’s final output node, and then also store that node’s own calculated final value, which will be based on the various B node values and the various B-to-output weights. Writing all that formally, we get something pretty clean, shown below.

现在我们已经准备好变量来存储所有数字,接下来就可以进行计算了。为了计算应该分配给节点 A1 的值,我们需要获取与 A1 相关的 9 个输入,并根据变量 A 1中列出的权重对它们进行加权。例如,游戏 1 由变量GAME D ATA[1]中存储的数字表示,应该分配给存储在 A 1[1]中的数字权重。因此,A1 的计算值的一部分是GAME D ATA[1]乘以 A 1[1]得到的乘积。游戏 2 由存储在变量GAME D ATA[2]中的数字表示,应分配存储在 A 1[2]中的数字权重。因此,A1 的计算值的另一部分是GAME D ATA[2]乘以 A 1[2]得到的乘积。对所有九场游戏重复此操作,我们会得到一个有点怪异的表达式,但其中 A1 的值由九个游戏/权重对的组合决定。

We now have variables ready to store all the numbers, so it is time to do the calculations. To calculate the value that should be assigned to node A1, we need to take the nine inputs relevant to A1 and weight them according to the weights listed in the variable A1. Game 1, for example, is represented by the number stored in variable GAMEDATA[1] and should be assigned the numeric weight stored at A1[1]. One part of the calculated value for A1 is thus the product we get when we multiply GAMEDATA[1] by A1[1]. Game 2 is represented by a number stored in variable GAMEDATA[2] and should be assigned the numeric weight stored in A1[2]. Thus another part of the calculated value for A1 is the product we get when we multiply GAMEDATA[2] by A1[2]. Repeating that for all nine games, we get a somewhat monstrous expression, but one in which the value of A1 is determined by the combination of nine game/weight pairs.

为了计算 A 1[10]中存储的值,我们取 A 1[1]、A 1[2]等中存储的权重,然后将它们乘以GAME D ATA[1]GAME D ATA[2]等中存储的值。随着计算机权重的变化,不同的游戏对最终计算值的影响也会相应有所差异。

To calculate the value stored in A1[10], we take the weights stored in A1[1], A1[2], and so on, and we multiply them by the values stored in GAMEDATA[1], GAMEDATA[2], and so on. As the computer changes the weights, different games will have correspondingly more or less influence on the final calculated value.

还有最后一个难题。像上面那样的计算几乎可以得出任何数字。理论上,节点 A1 的值可以计算为 42、251.25,甚至是 −3.98。如果这些数字出现剧烈波动,整个过程将变得异常困难,因为计算机需要比较的选项太多了。因此,我们需要人为地将计算结果限制在一个狭窄的范围内。一种传统方法是缩放计算结果,使其最终始终位于 0 到 1 之间。这需要一个函数,我们将其命名为SMUSH(),它接受正无穷和负无穷之间的任何数字作为输入,并返回一个相应缩放后的 0 到 1 之间的数字作为输出。SMUSH ()背后的实际数学原理叫做 Sigmoid函数,但就我们的目的而言,我们只需理解SMUSH()是一个数学运算,它将每个数字映射回 0 到 1 这个狭窄空间中的特定位置。因此,我们对 A 1[10]的计算变成了上面所示的长求和,然后将结果传递给SMUSH()

And then there is one last wrinkle. Calculations like the one sketched above can yield almost any number. The value of node A1 could in theory calculate to the number 42, the number 251.25, or even the number −3.98. Wild variability along these lines would make the overall process incredibly difficult, as the computer would have too many options to compare. So we need to artificially constrain the calculation to some narrow range. One conventional approach is to scale the calculation such that it always ends up somewhere between 0 and 1. This requires a function we will dub SMUSH() that accepts as input any number between positive and negative infinity and returns as output a correspondingly scaled number between 0 and 1. The actual math behind SMUSH() is something called the sigmoid function, but, for our purposes, we need only understand that SMUSH() is a math operation that maps every number back to a specific position in the narrow 0 to 1 space. Our calculation for A1[10] thus becomes the long summation shown above, with the result then passed through SMUSH().

计算机需要对网络中的所有节点运行相同的数学运算。也就是说,我们需要对 A2 进行求和并合并,对 A3 也进行求和并合并,依此类推,对每个节点(A1 到 A5、B1 到 B5)进行求和并合并,然后输出 OUTPUT,并将结果始终存储为相关列表变量中的最后一个条目。最终代码如下所示。

The computer needs to run that same math for all the nodes in the network. That is, we need to do a summation and smush for A2, a summation and smush for A3, and so on, for every node, A1 through A5, B1 though B5, and OUTPUT, always storing the result as the last entry in the relevant list variable. The resulting code is shown below.

现在,我们终于可以释放计算机了。创建这些变量并声明函数后,主程序首先会为各种权重选取初始值。与我们的许多黑盒设计一样,这里的选择是任意的,也可以是任意的,因此,我们使用 Python 的内置函数RANDOM.RANDINT()随机分配 -0.5 到+0.5之间的起始权重。该函数会生成 0 到 1 之间的数字,因此减去 0.5 即可得到所需的 -0.5 到+0.5之间的范围。请注意,我们同时包含正数和负数,因为我们不想偏袒任何一方。我们希望让计算机最终决定给定的输入是否应该增加或减少某个后续节点的值。

Now, finally, we can set the computer free. With these variables created and functions declared, our main program starts by picking initial values for the various weights. As with so much of our black-box design, our choices here are and can be arbitrary, so let’s just randomly assign starting weights between −0.5 and +0.5 using Python’s built-in function RANDOM.RANDINT(). That function generates numbers between 0 and 1, so subtracting 0.5 gives us our desired range from −0.5 to +0.5. Note that we are including both positive and negative numbers because we do not want to put a thumb on the scale one way or the other. We want to let the computer ultimately decide whether a given input should increase or decrease some later node value.

使用这些起始值,计算机现在可以逐个样本玩家地处理训练数据,运行一个我们称之为CRAZY M ATH() 的函数,根据正在审查的九场比赛计算所有节点值以及当时的权重。此时,计算机会将黑盒预测结果与已知的标记答案进行比较,使用这两个数字之间的差值(实际上是差值的平方)作为网络误差的度量。为什么要使用平方?因为平方意味着计算机对较大的误差的反应比对较小的误差的反应更强烈。如果被审查的玩家是随机玩的,因此正确的输出为 0,那么当我们对误差求平方时,0.20 的预测值将计为 0.04,但 0.4 的预测值在求平方后将计为更大的误差 0.16。必要的代码如下所示。

Using those starting values, the computer can now go through the training data one sample player at a time, running a function we will call CRAZYMATH() to calculate all the node values based on the nine games under review and the then-current weights. At that point, the computer will compare the resulting black-box prediction to the known, labeled answer, using the difference between those numbers—actually, the square of the difference—as a measurement of network error. Why the square? Because squaring means that the computer will react more to large misses than to small ones. If the player under review played randomly and thus the correct output is 0, a prediction of 0.20 will count as 0.04 when we square the error, but a prediction of 0.4, when squared, will count as a much larger error of 0.16. The necessary code is shown below.

在计算机测试完所有训练样本并计算出误差后,它的下一个任务就是找到更准确的权重。为此,我们需要创建第二个网络,一个最终可以与第一个网络进行比较的网络。我们可以保留相同的命名约定,只是将各个字母加倍,这样我们的第二个网络将具有 AA1、AA2、AA3、AA4、AA5、BB1、BB2 等节点,外加一个输出,我们可以将其称为OTHER O UTPUT。不过,这次我们不是完全随机地填充这些权重,而是从第一个网络中获取权重,并对其进行稍微调整,可以向上或向下调整,每个条目的最大随机偏移量为 0.05。因此,我们的第二个网络将与第一个网络非常相似,只是权重差异较小。

After the computer tests all the training samples and sums the resulting errors, the computer’s next job is to find even more accurate weights. To do that, we need to create a second network, one that we can ultimately compare to the first. We can keep the same naming conventions and just double up the various letters, so that our second network will have nodes like AA1, AA2, AA3, AA4, AA5, BB1, BB2, and so on, plus an output we can call OTHEROUTPUT. Instead of filling in these weights completely at random, however, this time we can take the weights from the first network and nudge them a bit, either up or down, using a maximum random shift of 0.05 per entry. Our second network will thus be pretty similar to the first one but with minor weight differences.

计算机现在可以再次运行训练数据,这次使用新的网络并计算新的总误差。如果结果低于总误差,如果误差大于原始总误差,则新的权重将成为新的起点。然而,如果新的总误差更高,计算机将删除微调后的权重并恢复到原始值。无论哪种情况,计算机都会在此基础上进行一组新的小幅随机微调,并再次查看最终的网络是否在所有训练数据中产生了更低的总误差分数。当计算机运行了所有这些数学运算的次数过多,或者误差小到足以宣布胜利时,该过程结束。

The computer can now run the training data again, this time using the new network and calculating a new total error. If the result is lower total error than the original total error, the new weights become the new starting point. If the new total error is higher, however, the computer deletes the nudged weights and goes back to the original ones. Either way, from there the computer makes a new set of small random nudges and once again looks to see if the resulting network generates a lower total error score across all the training data. The process ends when either the computer has run all this math an excessive number of times or the error is so small that the computer can declare victory.

你可能已经猜到了,我们刚刚构建的是一个非常精简的神经网络。这个名字是对人脑结构的致敬。就像计算机的互连节点网络一样,人脑也是一个由互连的神经元组成的网络。节点和神经元都接收电信号。它们都会改变这些信号以支持各种分析目标。然后,它们都会将输出传递到其他节点/神经元,理想情况下,这能够增强一些真正有效的反应和决策。诚然,大脑中发生的事情比我们的代码中发生的事情要复杂得多,但其理念是一样的。在计算机科学的前沿,研究人员已经找到了无数种方法来增强我们迄今为止勾勒出的基本框架的复杂性。

As you might already suspect, what we have just built is a very stripped-down neural network. That name is a nod to the structure of the human brain. Like the computer’s network of interconnected nodes, the human brain is a network of interconnected nerves called neurons. Nodes and neurons both receive electrical signals. They both alter those signals in support of various analytical goals. And they both then relay their outputs to other nodes/neurons, ideally empowering some really good reactions and decisions. Admittedly, what happens in the brain is much more complicated than what happens in our code, but the idea is the same. And at the cutting edge of computer science, researchers have found countless ways to add sophistication to the basic framework we have sketched thus far.

举个例子?在我们的代码中,当我们比较两个建议网络的误差时,我们会使用相对简单的分析来选出最终的优胜者。第二个网络的准确度究竟是高得多、略高一点、低得多还是略低,并不重要。相反,无论如何,如果第二个网络更准确,我们就采用所有权重并重新开始这个过程;如果第二个网络的准确度较低,我们就忽略所有权重并重新开始搜索。更好的方法可能是通过考虑任何误差差距的大小来校准任何变化。也许,对于较大的误差测量,我们应该对现有权重进行较大的调整;但对于较小的误差测量,我们应该采取更小、更谨慎的调整。这样做的目的是,将误差测量不仅仅用于二元的“是/否”决策,认识到误差测量包含的信息不仅仅是一个网络是否优于另一个网络的问题的答案。

One example? In our code, when we compare the error associated with the two suggested networks, we use a relatively simple analysis to pick the eventual winner. Nothing turns on whether the second network seems to be a lot more accurate, a little more accurate, a lot less accurate, or a little less accurate. Instead, no matter what, if the second network is more accurate, we adopt all of those weights and start the process anew, and if the second network is less accurate, we ignore all of those weights and restart the hunt. A better approach might be to calibrate any changes by factoring in the size of any error gap. Perhaps in response to a large error measurement we should make large changes to the existing weights, but in response to a small error measurement we should take smaller, more cautious steps. The idea would be to use error measurements for more than just a binary yes/no decision, recognizing that error measurements contain more information than just the answer to the question of whether one network is better than the other.

另一个例子是,我们的CRAZY M ATH()函数主要依赖于乘法。例如,我们通过将GAME D ATA[1]乘以 A 1[1],将GAME D ATA[2]乘以A 1[2],将GAME D ATA[ 3 ] 乘以 A 1[3] 等,直到GAME D ATA[9]乘以A 1 [9] ,来计算节点A1 的值。更复杂的神经网络也会这样做,但它们还会引入一个未乘的分量,它只是一些特定于节点的数字,例如 5、32.4 或 −71。这样做的目的是为计算机引入另一种方式来确定每个节点的优先级或淡化每个节点。例如,如果为某个节点分配了一些较大的负数,则该节点的SMUSH()值将被驱动至 0,直到该节点的乘数值变得如此之大以至于超过了所添加的负数。相反,如果为某个节点赋值一个巨大的正数,即使乘积的值很小,该节点的SMUSH()值也会趋近于 1。在某些网络中,允许计算机进行此类调整可以进一步提高准确率。而且,与所有其他正在使用的数字一样,计算机本质上是通过反复试验,自行找到这些特定于节点的值。

Another example is that our CRAZYMATH() function relies primarily on multiplication. We calculate a value for node A1, for instance, by multiplying GAMEDATA[1] by A1[1], adding GAMEDATA[2] times A1[2], adding GAMEDATA[3] times A1[3], and so on, through GAMEDATA[9] times A1[9]. More sophisticated neural networks do that, too, but they also introduce a not-multiplied component, which is just some node-specific number like 5, or 32.4, or −71. The idea is to introduce another way for the computer to prioritize or downplay each node. For instance, if a node is assigned some large negative number, the SMUSH() value for that node will be driven toward 0 until that node’s multiplied values grow so large that they exceed the added negative. Conversely, if a node is assigned some giant positive number, the SMUSH() value for that node will be driven toward 1 even when the multiplied values are small. Allowing the computer to make this type of adjustment can, in some networks, improve accuracy even further. And, like all the other numbers being used, the computer finds these node-specific values itself, in essence by trial and error.

章节挑战

Chapter Challenge

我们的代码目前可以区分随机玩家和不情愿玩家,但如果我们的网络能够做出更多区分,其价值将更加凸显。本章的挑战是改进我们的网络,使其能够识别人类玩家在石头剪刀布游戏中带来的其他缺陷。

Our code right now distinguishes between random and reluctant players, but our network would be even more valuable if it could draw additional distinctions. Your challenge this chapter is to improve our network such that it can identify additional imperfections that human players bring to the game rock-paper-scissors.

首先,添加一个名为MAKE ROCKY ()的函数,用于创建那些选择 ROCK 比例过高的玩家样本。你能构建一个网络来区分喜欢 ROCK 的玩家和不愿意 ROCK 的玩家吗?你能构建一个网络来同时处理随机、不愿意 ROCK 和“ROCK”三种玩家吗?或许可以设计一个输出指示新玩家 ROCK 的可能性,第二个输出指示新玩家不愿意 ROCK 的可能性,第三个输出指示新玩家是随机 ROCK 的可能性?你能更进一步,通过引入SORE L OSER()ALWAYS P APER()等函数来添加并区分更多人类的怪癖吗?

To start, add a function called MAKEROCKY() that creates sample players who disproportionately choose rock over the alternatives. Can you build a network that distinguishes rock-favoring players from reluctant ones? Can you build a network that handles random, reluctant, and “rocky” players at the same time, maybe with one output that indicates the likelihood that the new player is rocky, a second output that indicates the likelihood that the new player is reluctant, and a third output that indicates the likelihood that the new player is random? Can you go even further, adding and then distinguishing even more human quirks by introducing functions like SORELOSER() and ALWAYSPAPER()?

12

12

减少遗憾

Minimizing Regret

把一个塑料球递给蹒跚学步的孩子,一开始他们什么都做不了,就是把它当球玩。他们可能会咬它,可能会用手捏它,看看它能不能变形,甚至可能会试着把它塞进耳朵里。但到了某个时候,也许是不小心,他们会把球掉到地上,然后意识到它最神奇的特性:它会滚动。

Hand a plastic ball to a toddler, and at first they will do anything but use it as a ball. The child might chew on it. They might squeeze it between their hands to see if it can be deformed. They might even attempt to put it into an ear. But at some point, maybe accidentally, they will drop that ball to the ground and realize its most marvelous feature: it rolls.

计算机也能边做边学。我们可以告诉计算机,井字游戏是一个迭代游戏,玩家轮流在九宫格上放置标记,第一个连续完成三个标记的玩家获胜。我们可以创建有用的变量,让计算机记录它在盲玩过程中的经验,甚至可以编写代码,让计算机与其他计算机对战,然后两台计算机通过快速进行数千场对战来学习。但重要的是,我们可以就此止步。无需历史训练数据。程序员无需选择特定于游戏的算法。相反,程序员只需解释游戏规则,然后让计算机有机会实际咀嚼球,将其藏在耳朵里,并希望找到让球神奇滚动的方法。

Computers, too, can learn by doing. We can tell the computer that tic-tac-toe is an iterative game where players take turns placing their marks on a nine-square grid and the winner is the first player to achieve three of their own marks in a row. We can create helpful variables that allow the computer to record its experience as it plays the game blindly, and we can even write code such that the computer can play against some other computer and then the two can learn by quickly playing thousands of games together. But the important point is that we can stop there. No historic training data required. No need for the programmer to pick some game-specific algorithm. Instead, the programmer need only explain the rules of the game and then provide an opportunity for the computer to functionally chew on the ball, hide it in an ear, and, hopefully, figure out how to make it marvelously roll.

一种遵循这一思路的机器学习技术是一种相对较新的方法,称为反事实遗憾最小化。它的工作原理如下。假设计算机正在学习石头剪刀布游戏,并且在某一轮中,我们的计算机完全随意地决定出布,而对手的计算机恰好出剪刀。最初的计算机立刻就学到了一些东西:至少在这个例子中,如果它出剪刀,那台计算机的胜率会稍微高一点,而如果它出石头,那胜率会高很多。假设计算机再玩一轮,这次可能出剪刀,而对手恰好出石头。计算机会学到更多。这一次,我们的计算机“后悔”没有出石头(石头本来可以打平),尤其后悔没有出布(石头本来可以赢)。

One machine learning technique along these lines is a relatively new approach called counterfactual regret minimization. Here’s how it works. Imagine that the computer is learning the game rock-paper-scissors and, in a given round, makes the completely arbitrary decision to play paper while some opposing computer happens to play scissors. Right away, the original computer learns something: in this example, at least, that computer would have been a little better off had it played scissors and a lot better off had it played rock. Suppose that the computer plays another round, perhaps this time playing scissors while its opponent by chance plays rock. The computer learns something more. This time, our computer “regrets” not having played rock, which would have earned a tie, and particularly regrets not having played paper, which would have earned a win.

将失败记为 -1,平局记为 0,胜利记为+ 1,计算机可以按如下方式量化这些信息。在第一场游戏中,计算机了解到,如果它出石头而不是出布,它总共会多得 2 分,如果它出剪刀而不是出布,它总共会多得 1 分。在第二场游戏中,计算机了解到,如果它出布而不是出剪刀,它最终会多得 2 分,而如果出石头,它会多得 1 分。将这些结果加在一起,计算机现在知道,根据迄今为止的经验,如果在正确的时间出石头,它可以多得 3 分,如果在正确的时间出布,它可以多得 2 分,如果在正确的时间出剪刀,它可以多得 1 分。因此,我们可以认为计算机因为没有出石头而感到 3 分遗憾,因为没有出布而感到 2 分遗憾,因为没有出剪刀而感到 1 分遗憾。

Count a loss as −1, a tie as 0, and a win as +1, and the computer can quantify this information as follows. In the first game, the computer learned that it would have in total done 2 points better had it played rock instead of paper and 1 point better had it played scissors instead of paper. In the second game, the computer learned that it would have ended up with 2 more points had it played paper instead of scissors and 1 more point had it played rock. Adding those results together, the computer now knows that, in its experience so far, it could have earned 3 more points by playing rock at the right time, 2 more points by playing paper at the right time, and 1 more point by playing scissors at the right time. We can thus think of the computer as suffering 3 points of regret from not playing rock, 2 points of regret from not playing paper, and 1 point of regret from not playing scissors.

如果计算机要计算数百场甚至数千场游戏的遗憾结果,这个过程会建议什么策略呢?让我们来一探究竟。将COMPUTER1 S TRATEGY定义为一个包含三位小数的列表,每位小数都在 0 到 1 之间,其中第一位表示第一台计算机出石头的可能性,第二位表示出布的可能性,第三位表示出剪刀的可能性。因此, COMPUTER1 S TRATEGY为 [1, 0, 0] 表示始终出石头。COMPUTER1 S TRATEGY为 [ 0.4 , 0.4, 0.2] 表示 40% 的时间出石头,40% 的时间出布,20% 的时间出剪刀。变量COMPUTER2 S TRATEGY可以是另一个三位小数列表,用于跟踪相同的信息,但针对的是计算机 1 的对手,也就是令人生畏的计算机 2。

What strategy would this process suggest were the computer to calculate its regret across hundreds or even thousands of games? Let’s find out. Define COMPUTER1STRATEGY to be a list of three decimals, each between 0 and 1, where the first represents the likelihood that the first computer will play rock, the second the likelihood that it will play paper, and the third the likelihood that it will play scissors. A COMPUTER1STRATEGY of [1, 0, 0] thus translates to always playing rock. A COMPUTER1STRATEGY of [0.4, 0.4, 0.2] means playing rock 40 percent of the time, paper 40 percent of the time, and scissors 20 percent of the time. The variable COMPUTER2STRATEGY can then be another three-decimal list keeping track of the same information but for computer 1’s opponent, the intimidating computer 2.

接下来,将COMPUTER1 R EGRETCOMPUTER2 R EGRET定义为列表,用于跟踪每台机器的遗憾值,并按移动类型进行区分。例如,如果在过去的游戏中,计算机 1 可以通过出石头多得 4 分,出布多得 2 分,出剪刀多得 5 分,COMPUTER1 R EGRET 的顺序将是列表 [4, 2, 5]。同样,如果在过去的游戏中,计算机 2 可以通过出石头多得 3 分,出布多得 4 分,出剪刀多得 2 分,那么COMPUTER2 R EGRET将显示 [3, 4, 2]。

Next, define COMPUTER1REGRET and COMPUTER2REGRET to be lists that keep track of each machine’s experienced regret, separated by type of move. So, for example, if in past games computer 1 could have earned 4 more points by playing rock, 2 more by playing paper, and 5 more by playing scissors, COMPUTER1REGRET would be, in order, the list [4, 2, 5]. Likewise, if in past games computer 2 could have earned 3 more points by playing rock, 4 more by playing paper, and 2 more by playing scissors, COMPUTER2REGRET would show [3, 4, 2].

现在我们需要一个函数,我们称之为SCORE(),它可以将两位玩家当前的策略作为输入,并返回从请求玩家的角度来看的预期结果。例如,如果计算机 1 正在执行策略 [1, 0, 0],而计算机 2 正在执行策略 [0, 1, 0],那么SCORE(COMPUTER1 S TRATEGY, COMPUTER2 S TRATEGY)应该返回值 -1,因为换句话说,计算机 1 正在出石头,而计算机 2 正在出纸,从计算机 1 的角度来看,这将导致失败。

Now we need a function, let’s call it SCORE(), that can take as input the two players’ current strategies and return as output the expected outcome from the requesting player’s perspective. For example, if computer 1 is playing the strategy [1, 0, 0] and computer 2 is playing the strategy [0, 1, 0], SCORE(COMPUTER1STRATEGY, COMPUTER2STRATEGY) should return the value −1 because, in words, computer 1 is playing rock while computer 2 is playing paper, which from computer 1’s perspective will generate a loss.

这个函数需要比乍看起来更复杂,因为随着计算机的学习,它们的方法可能是像 [0.3, 0.2, 0.5] 这样的混合策略,而不是像 [0, 0, 1] 或 [0, 1, 0] 这样数学上简单的策略。也就是说,SCORE()函数可能会被调用,计算机 1 使用“有时出石头,有时出布,有时出剪刀”的策略 [0.2, 0.3, 0.5],而计算机 2 则使用同样复杂的响应策略 [0.4, 0.6, 0]。因此,为了评估结果,SCORE() 函数必须考虑九种可能的组合。本质上,该函数必须评估计算机 1 出石头而计算机 2 出石头、计算机 1 出石头而计算机 2 出布、计算机 1 出石头而计算机 2 出剪刀、计算机 1 出布而计算机 2 出石头、计算机 1 出布而计算机 2 出布的概率,等等,针对计算机 1 的所有三种可能性和计算机 2 的所有三种可能性。

This function needs to be more complicated than it might at first seem because, as the computers learn, their approaches might turn out to be mixed strategies like [0.3, 0.2, 0.5] rather than mathematically straightforward strategies like [0, 0, 1] or [0, 1, 0]. That is, SCORE() might plausibly be called with computer 1 using the sometimes-play-rock, sometimes-play-paper, sometimes-play-scissors strategy [0.2, 0.3, 0.5] and computer 2 playing the similarly complicated responsive strategy [0.4, 0.6, 0]. To evaluate outcomes, SCORE() thus has to consider nine possible combinations. In essence, the function must evaluate the odds that computer 1 plays rock while computer 2 plays rock, computer 1 plays rock while computer 2 plays paper, computer 1 plays rock while computer 2 plays scissors, computer 1 plays paper while computer 2 plays rock, computer 1 plays paper while computer 2 plays paper, and on and on, for all three computer 1 and all three computer 2 possibilities.

下表列出了九个示例计算,以计算机 1 出 [0.2, 0.3, 0.5] 和计算机 2 出 [0.4, 0.6, 0] 为例。第一行关注石头/石头的可能性。如图所示,计算机 1 出石头的概率为 20%,计算机 2 出石头的概率为 40%,因此两者同时发生的可能性就是它们的乘积,0.4 乘以 0.2,即 0.08。在这种情况下,计算机 1 得 0 分。第四行关注一对更有趣的牌,纸/石头。计算机 1 出纸的概率为 30%,计算机 2 出石头的概率为 40%,因此这对牌的可能性是 0.3 乘以 0.4,即 0.12。这些情况下的结果都是+1,代表计算机 1 获胜。图表的底线显示了所有 9 个配对的总数,在本例中对于计算机 1 来说,净正收益为 0.10。这是SCORE()将作为输出返回的数字。

Nine sample calculations are shown in the table below, using the example where computer 1 plays [0.2, 0.3, 0.5] and computer 2 plays [0.4, 0.6, 0]. The first row focuses on the possibility of rock/rock. As shown, there is a 20 percent chance that computer 1 will play rock and a 40 percent chance that computer 2 will play rock, so the likelihood that both will happen simultaneously is simply their product, 0.4 times 0.2, or 0.08. In those instances, computer 1 earns 0 points. The fourth row focuses on a more interesting pair, paper/rock. There is a 30 percent chance that computer 1 will play paper and a 40 percent chance that computer 2 will play rock, so the likelihood of this pair is 0.3 times 0.4, which is 0.12. The result in those instances is +1, which represents a win for computer 1. The bottom line of the chart shows the total across all nine pairings, which in this example turns out to be a net positive payoff of 0.10 for computer 1. This is the number that SCORE() will return as output.

示例代码见下一页。记住,这里的输入是两个三位变量,按顺序列出相关玩家出石头的可能性、相关玩家出布的可能性以及相关玩家出剪刀的可能性。

Sample code is shown on the next page. Remember, the inputs here are two three-number variables that list, in order, the likelihood that the relevant player will play rock, the likelihood that the relevant player will play paper, and the likelihood that the relevant player will play scissors.

现在,我们的主程序可以简化为一个简单的循环,计算机玩游戏,计算各自的遗憾值,然后在重新开始游戏之前进行调整。循环从计算机 1 调用SCORE()开始,计算与其当前策略相关的预期结果 (EXPECTED OUTCOME) 接下来,计算机 1 需要对放弃当前策略、改为只出石头、只出布或只出剪刀的替代方案进行评分。在代码的其他地方,我们可以将PLAY ROCK定义为 [1, 0, 0],并且因此,我们可以编写非常易读的代码:我们将调用SCORE()得到的结果存储在POSSIBLE O UTCOME中,其中计算机 1 玩PLAY R OCK,而计算机 2 坚持使用COMPUTER2 S TRATEGY。如果POSSIBLE O UTCOME优于当前的EXPECTED O UTCOME,则计算机存储由此产生的遗憾值。然后,计算机继续评估其他选项,具体方法是测试PLAY P APERPLAY S CISSORS 的可能性,并存储这些遗憾值。

Our main program can now be a simple loop where the computers play the game, calculate their respective regrets, and then adjust before playing anew. The loop starts with computer 1 calling SCORE() to calculate the EXPECTEDOUTCOME associated with its current strategy. Next, computer 1 needs to score the alternatives where it would abandon its current strategy and instead play rock exclusively, or paper exclusively, or scissors exclusively. Elsewhere in the code, we can define PLAYROCK to mean [1, 0, 0] and thus here we can write very readable code: we store in POSSIBLEOUTCOME the result we get from calling SCORE() with computer 1 playing PLAYROCK while computer 2 sticks with COMPUTER2STRATEGY. If that POSSIBLEOUTCOME is better than the current EXPECTEDOUTCOME, the computer stores the resulting regret. Then the computer continues to evaluate its other options, specifically by testing the PLAYPAPER and PLAYSCISSORS possibilities and storing those regret values, too.

计算机 1 的最后一步是更新其策略,以涵盖所有遗憾。为此,计算机 1 需要首先将其迄今为止放弃的所有分数加起来。然后,计算机 1 可以计算三个分数:如果在石头比石头更好的比赛中出石头,它将获得的总分数的分数;如果在石头比石头更好的比赛中出布,它将获得的总分数的分数;以及如果在剪刀比剪刀更好的比赛中出布,它将获得的总分数的分数。这些分数将成为计算机 1 的新策略。从现在开始,计算机 1 出石头的频率将是计算机根据之前未出石头而失去的分数计算出的分数。同样,计算机 1 出剪刀和布的频率将是计算机根据既不出布也不出剪刀而失去的分数计算出的分数。然后,计算机 2 可以使用类似的过程来更新其策略。结果如何?在过去的比赛中没有采取某一特定举动而感到的遗憾越多,在未来的比赛中采取这一举动的次数就越多。

The last step for computer 1 is now to update its strategy in a way that accounts for all that regret. To do so, computer 1 needs to first add up all the points that it has forsaken to date. Then, computer 1 can calculate three fractions: the fraction of the total points it would have earned had it played rock in games where that was a better move, the fraction of the total points it would have earned had it played paper in games where that was a better move, and the fraction of the total points it would have earned had it played scissors in games where that was a better move. Those fractions become computer 1’s new strategy. From now on, the frequency with which computer 1 will play rock will be the fraction the computer calculated above based on the points forsaken by not playing rock. Similarly, the frequency with which computer 1 will play paper and scissors will be the fractions the computer calculated based on the points forsaken by not playing paper and not playing scissors. And then a similar process can be used by computer 2 to update its strategy. The result? The more regret associated with not playing a given move in past games, the more that move will be played in future games.

此时,你知道两台计算机最终以相同的概率出石头剪刀布。这也是人类玩家所追求的策略,它有一个重要的特性:如果我像这样随机出牌,那么对于你,我的对手来说,没有比随机出牌更好的策略了;而且,即使我知道你会随机出牌,我也没有动力改变我的策略。换句话说,我随机出牌,你随机回应,这是一个稳定的均衡,一局又一局。到目前为止,最酷的是,多亏了这些后悔计算,计算机自己解决了这个问题,没有我们的提示,也没有训练数据。计算机只是反复玩游戏,并记录每次它做出一个动作,但做出其他动作会更好时所经历的后悔。随着时间的推移,这种后悔模式清楚地表明,最好的方法就是在三种可选动作之间进行随机化。还不错。

At this point, you know that both computers ended up playing rock, paper, and scissors with equal likelihood. That is the strategy to which human players also aspire, and it has an important property: If I play randomly like this, there is no better strategy for you, my opponent, than to play randomly in response; and, even if I know you are going to play randomly in response, I have no incentive to change my strategy either. Put differently, me playing randomly and you responding randomly is a stable equilibrium, game after game after game. And the cool thing so far is that, thanks to these regret calculations, the computer figured this out all by itself, with no hints from us and no training data. The computer simply played the game repeatedly and kept track of the regret it experienced whenever it made one move but would have been better off making some other move. Over time, that pattern of regret made clear that the best approach is simply to randomize across the three available moves. Not bad.

但我们可以做更令人印象深刻的事情:我们可以将同样的方法应用于更复杂的游戏。考虑战争游戏布洛托上校。每个玩家指挥一支由五个营组成的军队。有三个独立的战场存在争议,玩家必须分别决定在每个战场上投入多少个营。例如,第一个玩家可能决定向第一个战场派遣两个营,向第二个战场派遣两个营,向第三个战场派遣一个营。或者,第一个玩家也可能决定向第一个战场派遣四个营,向第二个战场派遣一个营,并且不向第三个战场派遣任何营。战斗胜利严格通过比较双方部署的营数来确定。如果第一个玩家在特定战场上投入的营数多于第二个玩家,则第一个玩家赢得该战场。如果第二个玩家投入的营数更多,则第二个玩家获胜。显然,这里的目标是赢得尽可能多的战场,同时尽可能减少损失。

But we can do something even more impressive: we can apply this same approach to much more complicated games. Consider the war game Colonel Blotto. Each player commands an army consisting of five battalions. There are three separate battlefields in dispute, and players must separately decide how many of their battalions to commit to each field. For example, the first player might decide to send two battalions to the first field, two battalions to the second field, and one battalion to the third field. Or the first player might instead decide to send four battalions to the first field, one to the second field, and none to the third field. Battle victories are determined strictly by comparing the number of battalions deployed by each side. If the first player has more battalions committed to a given battlefield than does the second player, the first player wins that field. If the second player has more, the second player wins. Obviously, the goal here is to win as many battlefields as possible while losing as few as possible.

为了确保游戏动态清晰,假设第一位玩家调遣其部队,三个营被派往第一个战场,其余两个战场各派一个营。同时,假设第二位玩家在对第一位玩家的部署一无所知的情况下,部署其部队,两个营被派往第一个战场,两个营被派往第二个战场,一个营被派往第三个战场。根据这些决策的得分,第一位玩家赢得第一个战场,第二位玩家赢得第二个战场,第三个战场以平局结束。

Just to make sure that the game dynamics are clear, imagine that the first player commits its troops such that three battalions are sent to the first battlefield and one battalion is sent to each of the other two. At the same time, imagine that the second player, without knowing anything about the first player’s deployments, deploys its troops such that two battalions are sent to the first battlefield, two are sent to the second battlefield, and one is sent to the third battlefield. Scoring those decisions, the first player wins the first battlefield, the second player wins the second battlefield, and the third battlefield ends in a tie.

当我们编写代码来评估石头剪刀布时,关键问题是计算机应该以何种频率出三种可选的出手方式。因此,例如,我们将COMPUTER1 S TRATEGY定义为一个包含三个小数的列表,第一个小数表示计算机 1 出石头的可能性,第二个小数表示计算机 1 出布的可能性,第三个小数表示计算机 1 出剪刀的可能性。然而,这次可能的出手方式不是三种,而是二十一种。也就是说,指挥官可以向第一个战场派遣零个营,向第二个战场派遣零个营,向第三个战场派遣五个营;或者向第一个战场派遣零个营,向第二个战场派遣一个营,向第三个战场派遣四个营;或者向第一个战场派遣零个营,向第二个战场派遣两个营,向第三个战场派遣三个营;等等。因此,新的COMPUTER1 S TRATEGYCOMPUTER2 S TRATEGY变量将分别需要二十一个条目,每种可能的部署模式对应一个条目。

When we wrote code to evaluate rock-paper-scissors, the big-picture question was the frequency with which the computer should play each of three available moves. Thus, for example, we defined COMPUTER1STRATEGY to be a list of three decimals, with the first representing the likelihood that computer 1 would play rock, the second the likelihood that computer 1 would play paper, and the third the likelihood that computer 1 would play scissors. This time, however, instead of three possible moves there are twenty-one. That is, a commander can send zero battalions to the first battlefield, zero to the second battlefield, and five to the third battlefield; or zero battalions to the first, one to the second, and four to the third; or zero battalions to the first, two to the second, and three to the third; and so on. The new COMPUTER1STRATEGY and COMPUTER2STRATEGY variables will thus each need twenty-one entries, one for each possible deployment pattern.

在三个战场上部署五个营有21种不同的方法。因此, “COMPUTER1 S TRATEGY”列表将包含21个条目,第一个条目代表计算机1选择第一种部署模式的概率,第二个条目代表它选择第二种部署模式的概率,依此类推,涵盖所有21种可能的部署模式。

There are twenty-one different ways to deploy five battalions across three battlefields. The list COMPUTER1STRATEGY will therefore have twenty-one entries, one representing the probability that computer 1 will choose the first deployment pattern, the next representing the probability that it will choose the second deployment pattern, and so on, for all twenty-one possible patterns.

是的,这意味着COMPUTER1 R EGRETCOMPUTER2 R EGRET现在也需要 21 个值,因为与之前一样,计算机需要分别跟踪与每个放弃选择相关的总遗憾值。按照上图所示的顺序,COMPUTER1 R EGRET[5]将存储与未能选择部署相关的总遗憾值 [0, 4, 1],而COMPUTER1 R EGRET[21]将跟踪与未能选择部署相关的总遗憾值 [5, 0, 0]。

Yes, that means that COMPUTER1REGRET and COMPUTER2REGRET now also need twenty-one values because here, as before, the computer will need to separately track the total regret associated with each forsaken choice. Using the order shown in the chart above, COMPUTER1REGRET[5] would therefore store the total regret associated with the failure to choose the deployment [0, 4, 1], while COMPUTER1REGRET[21] would track the total regret associated with the failure to choose the deployment [5, 0, 0].

至于衡量遗憾程度,我们的得分函数需要一些改进,既要考虑 21 种不同走法的可能性,又要实现逐个战场的比较。然而,其底层逻辑与我们应用于石头剪刀布的逻辑相同。例如,假设调用SCORE()函数时,第一个玩家的策略包括大约 10% 的时间选择 [1, 3, 2],而第二个玩家的策略包括大约 15% 的时间选择 [2, 0, 1]。仅关注他们各自策略的这些部分,我们知道,当第二个玩家选择 [2, 0, 1] 时,第一个玩家选择 [1, 3, 2] 的可能性是 0.10 乘以 0.15,即 0.015。我们还知道,当这种罕见的对决真的发生时,第一个玩家将赢得两场战斗,输掉一场,净得分为+1

As for measuring regret, our score function will need some work, both to accommodate the possibility of twenty-one different moves and to implement the battlefield-by-battlefield comparisons. The underlying logic, however, is the same as what we applied for rock-paper-scissors. For example, suppose that SCORE() is called with a first player whose strategy includes choosing the deployment [1, 3, 2] roughly 10 percent of the time and a second player whose strategy includes choosing [2, 0, 1] about 15 percent of the time. Focusing just on those portions of their respective strategies, we know that the likelihood that the first player picks [1, 3, 2] at a time when the second player is playing [2, 0, 1] is 0.10 times 0.15, which is 0.015. We also know that when this rare matchup does play out, the first player will win two battles but lose one, for a net score of +1.

当然,这只是441种可用组合中的一种。也就是说,本游戏的COMPUTER1 S TRATEGY是一个包含21个概率的列表,每种可能的部署模式对应一个概率。同样,COMPUTER2 S TRATEGY也是一个包含21个概率的列表,每种可能的部署模式对应一个概率。因此, SCORE()函数需要遍历所有441种可能的组合,计算每种特定组合发生的概率,然后逐个战场比较部署情况,看看每个玩家赢得了多少个战场。这就是我们新的SCORE()函数。

Of course, that is just one possible combination out of 441 available pairings. That is, COMPUTER1STRATEGY for this game is a list of twenty-one probabilities, one for each possible pattern of deployment. Likewise, COMPUTER2STRATEGY is also a list of twenty-one probabilities, one for each possible pattern of deployment. The SCORE() function thus needs to iterate through all 441 possible combinations, calculating the odds that each specific combination occurs and then comparing the deployments battlefield by battlefield to see how many battlefields are won by each player. That is our new SCORE() function.

示例代码如下所示。请注意,本次输入的是请求玩家的二十一数字策略、对手的二十一数字策略,以及POSSIBLE D EPLOYMENTS(即二十一种可用走法的列表)。也就是说,POSSIBLE D EPLOYMENTS[1]为 [0, 0, 5],POSSIBLE D EPLOYMENTS[2]为 [0, 1, 4],其中POSSIBLE D EPLOYMENTS[2][0]是 [0, 1, 4] 中的 0,POSSIBLE D EPLOYMENTS[2][1]是 [0, 1, 4] 中的 1。函数需要此列表才能理解MY S TRATEGYYOUR S TRATEGY中的数字含义。

Sample code appears below. Note that the input this time is the requesting player’s twenty-one-number strategy, the opponent’s twenty-one-number strategy, and then also POSSIBLEDEPLOYMENTS, which is the list of the twenty-one available moves. That is, POSSIBLEDEPLOYMENTS[1] is [0, 0, 5] and POSSIBLEDEPLOYMENTS[2] is [0, 1, 4], with POSSIBLEDEPLOYMENTS[2][0] being the 0 from [0, 1, 4] and POSSIBLEDEPLOYMENTS[2][1] being the 1 from [0, 1, 4]. The function needs this list in order to understand what the numbers in MYSTRATEGY and YOURSTRATEGY mean.

我们的主程序与我们为石头剪刀布编写的主程序非常相似。当时,循环从计算机 1 调用SCORE()开始,计算与当时的策略相关的预期结果。此后,计算机 1 又调用了三次SCORE() ,一次测试看石头是否给出了更好的结果,一次测试看是否给出了更好的结果,一次测试看剪刀是否给出了更好的结果。然而,这一次,计算机不是只测试三种备选方法,而是需要测试 21 种可能性:测试策略为 [1, 0, 0, ... 0, 0],其中计算机只会使用部署 [0, 0, 5];测试策略为[0, 1, 0, ... 0, 0],其中计算机只会使用部署[0, 1, 4];依此类推,直到最后一个选项 [0, 0, 0, . . 0, 1],其中计算机只会使用 [5, 0, 0] 的部署。和之前一样,计算机会跟踪其遗憾程度,并根据这些衡量出来的失望程度调整策略。

Our main program is then very similar to the main program we wrote for rock-paper-scissors. Back then, the loop began with computer 1 calling SCORE() to calculate the expected outcome associated with its then-current strategy. After that, computer 1 called SCORE() three more times, once testing to see if PLAYROCK gave a better result, once testing to see if PLAYPAPER gave a better result, and once testing to see if PLAYSCISSORS gave a better result. This time, however, instead of testing just three alternative approaches, the computer needs to test twenty-one possibilities: a TESTSTRATEGY of [1, 0, 0, . . . 0, 0], where the computer would only use the deployment [0, 0, 5]; a TESTSTRATEGY of [0, 1, 0, . . . 0, 0], where the computer would only use the deployment [0, 1, 4]; and so on, through the last option, [0, 0, 0, . . . 0, 1], where the computer would only use the deployment [5, 0, 0]. As before, the computer keeps track of its regret and adjusts its strategy in light of those measured disappointments.

我们的 Blotto 代码最终建议计算机从各种策略中随机选择,这些策略包括向一个战场派遣三个营,向另一个战场派遣两个营,以及向最后一个战场派遣零个营。也就是说,计算机有时会将其营部署为 [3, 0, 2],有时会部署为 [0, 3, 2],依此类推,但绝不会部署为 [1, 1, 3] 或 [0, 0, 5]。令人惊奇的是,这种方法与我们在计算机基于遗憾的石头剪刀布策略中提到的特性相同:如果你对我使用这个策略,我也会很乐意对你使用同样的策略,而你则会毫无动力去做任何事情,只会继续对我使用同样的策略。我随机选择 [3, 2, 0] 的变体,而你也用 [3, 2, 0] 的随机变体做出响应,这是一种稳定的平衡,一场又一场,我们都没有任何动力去改变我们的方法。

Our Blotto code ends up suggesting that the computer randomly choose from among the various strategies that involve sending three battalions to one field, two battalions to another field, and zero battalions to the last field. That is, the computer will sometimes deploy its battalions as [3, 0, 2], sometimes as [0, 3, 2], and so on, but never as [1, 1, 3] or [0, 0, 5]. And the amazing thing is that this approach has the same property that we flagged with respect to the computer’s regret-based strategy for rock-paper-scissors: If you were to play this strategy against me, I would in response happily play this same strategy against you, and you would in response have no incentive to do anything but continue to play this same strategy against me. Me randomly choosing variations of [3, 2, 0] and you responding with random variations of [3, 2, 0] is a stable equilibrium, game after game after game, with neither of us having any incentive to alter our approach.

不相信?想象一下,你正在玩 [3, 2, 0] 的变体。我当然不会用 [5, 0, 0] 的变体来应对,因为那里的各种组合最终要么让我净平局,要么让我净亏损。我宁愿坚持我最初的计划,混合使用 [3, 2, 0] 的变体,因为至少在那里我有一种赢的方式,一种输的方式,其他都是平局。同样,我不会选择 [2, 2, 1] 的变体,因为同样,在那里我的结果要么是净平局,要么是净亏损。再次强调,[3, 2, 0] 系列的选项更好。

Don’t believe me? Imagine that you are playing variations of [3, 2, 0]. I am certainly not going to respond with variations of [5, 0, 0] because the various pairings there end up giving me either a net tie or a net loss. I would rather stick with my original plan to mix variations of [3, 2, 0] because at least there I have one way to win, one way to lose, and everything else is a tie. Similarly, I am not going to opt for variations of [2, 2, 1] because, again, my outcomes there would be either a net tie or a net loss. Once more, the [3, 2, 0] family of options is better.

有趣的是,其他可能的变体也并不吸引人,因为平均而言,它们的表现都不如 [3, 2, 0]。例如,如果你玩 [3, 2, 0],而我随机玩 [3, 1, 1]、[1, 3, 1] 或 [1, 1, 3],我获得净平局、净亏损或净盈利的可能性均等。这与我目前策略 [3, 2, 0] 的净零预期相同。如果我玩 [4, 1, 0] 的变体,同样的道理也适用。也就是说,如果你玩 [3, 2, 0] 而我随机玩 [4, 1, 0]、[4, 0, 1]、[1, 4, 0]、[1, 0, 4]、[0, 1, 4] 和 [0, 4, 1] 的组合,我最终平均获得零分,因为其中两对对我来说是赢家,两对是输家,两对是平局。

Interestingly, other possible variations are not tempting either because, on average, none of them outperform [3, 2, 0]. For instance, if you play [3, 2, 0] and I randomly play either [3, 1, 1], [1, 3, 1], or [1, 1, 3], I would with equal likelihood earn a net tie, a net loss, or a net win. That’s the same net zero expectation I have for my current strategy of [3, 2, 0]. The same math applies if I were to play variations of [4, 1, 0]. That is, if you play [3, 2, 0] and I randomly play some mix of [4, 1, 0], [4, 0, 1], [1, 4, 0], [1, 0, 4], [0, 1, 4], and [0, 4, 1], I end up earning zero points on average because two of those pairings are winners for me, two are losers, and two are ties.

如果我坚持使用 [3, 2, 0] 系列,以回应你使用 [3, 2, 0] 系列的计划,你猜怎么着?你也会坚持你的计划。事实上,如果你派间谍去参加我的军事规划会议,得知我完全致力于 [3, 2, 0] 系列的部队部署,你不会利用这些信息做出任何改变。你会知道我的策略是[3, 2, 0]、[3, 0, 2]、[2, 0, 3]、[2, 3, 0]、[0, 2, 3] 和 [0, 3, 2] 的不可预测组合;凭借这种战场情报,你将能够坚持到底,并按照其中一种模式随机部署你的部队。正如我们的代码所示,这些选择处于完美的平衡状态。

And if I stick with the [3, 2, 0] family in response to your plan to use the [3, 2, 0] family, guess what? You stick with your plan too. Indeed, if you send a spy to my military planning meeting and learn that I am fully committed to the [3, 2, 0] family of troop deployments, you will use that information to change absolutely nothing. You will know that my strategy is an unpredictable mix of [3, 2, 0], [3, 0, 2], [2, 0, 3], [2, 3, 0], [0, 2, 3], and [0, 3, 2]; armed with that battlefield intelligence, you will stay the course and also randomly deploy your battalions in one of those patterns. Exactly as our code suggests, these choices are in perfect equilibrium.

计算机可以自行解决这个问题。

And the computer figures that out all by itself.

章节挑战

Chapter Challenge

我们目前的代码只允许每个指挥官在三个战场部署五个营。你在本章的挑战是扩展游戏,以允许更多营和更多战场。

Our code right now allows each commander to deploy only five battalions across three battlefields. Your challenge for this chapter is to expand the game to allow more battalions and more fields.

在工作过程中,你需要改进我们的示例代码。例如,目前的代码列出了一场涉及五个营的游戏中允许的21种移动方式。为了使代码更加灵活,你需要创建一个循环,根据该版本允许的基地数量和允许的营数量生成所有合法的模式。然后,你还需要思考如何定义所有策略和遗憾变量,以便它们能够容纳合适的条目数量,而这个数量可能大于或小于21。

As you work, you will need to make improvements throughout our sample code. For instance, right now the code lists out the twenty-one moves that are allowed in a game that involves five battalions. For more flexible code, you will have to create a loop that generates all the legal patterns in light of that version’s permissible number of bases and permissible number of battalions. You will then also have to think about ways to define all the strategy and regret variables such that they can accommodate the right number of entries, which again might be something more or less than twenty-one.

想要更多?看看你能否进一步修改代码,使某些战场比其他战场更有价值。这会给必要的决策带来新的挑战,因为突然之间,某些胜利比其他胜利更有价值。当然,如果敌方将兵力集中在最有价值的战场上,正确的应对措施可能是完全忽略这些战场,转而在所有低风险的战斗中横扫胜利。面对如此复杂的情况,你可能不确定该如何操作,但你能让计算机帮你算出来吗?

Want more? See if you can further change the code such that some battlefields are worth more points than others. That adds a new wrinkle to the necessary decision-making because suddenly some wins are more valuable than the rest. Of course, if the opposing team focuses its battalions on the most prized battlefields, the right response might be to ignore those battlefields entirely and instead sweep victories across all the lower-stakes fights. You are probably not sure how to play in light of these many complexities, but can you ask the computer to figure it out for you?

后记

Afterword

1985年至1997年间,科技公司IBM投入数百万美元开发了最终被称为“深蓝”的计算机,这是第一台在国际象棋比赛中击败卫冕世界冠军的计算机。2017年,谷歌旗下的人工智能公司DeepMind更进一步,发布了自学习代码,该代码通过随机游戏和短短24小时内就彻底掌握了国际象棋,甚至可以击败最优秀的人类棋手。我们为什么要关心这个?像IBM和谷歌这样的公司,怎么可能在一个棋盘游戏上投入如此多的时间和金钱呢?

Between 1985 and 1997, the technology company IBM invested millions of dollars developing what would ultimately come to be known as Deep Blue, the first computer to ever defeat a reigning world champion in the game of chess. In 2017, the Google-affiliated artificial intelligence company DeepMind did one better, unveiling self-learning code that, through random play and over just twenty-four hours, learned the game so thoroughly that it, too, could reliably beat the best human players. Why should we care about this? How could it possibly make sense for companies like IBM and Google to invest so much time and money in a board game?

我怀疑,原因在于这些公司深知本书的秘密真相:在文字游戏、棋盘游戏和策略游戏背景下开发的算法必然在其他领域也有强大的应用。能够走出迷宫的计算机,未来或许可以驾驶自动驾驶汽车。谷歌的国际象棋程序的一个版本已经被用来预测某些人类蛋白质的结构,这项研究有望在医学和生物学领域带来重大进展。简而言之,这一切都充满乐趣,直到它变得不再有趣。而到了那个时候,如果我们打好牌,一个好的游戏算法真的可以改变世界。

The reason, I suspect, is that these companies know what is also this book’s secret truth: algorithms developed in the context of word games, board games, and strategy games inevitably have powerful applications elsewhere, too. A computer that can find its way out of a maze might later navigate a self-driving car. A version of Google’s chess program has already been put to work predicting the structure of certain human proteins, research that promises to unlock significant advances in both medicine and biology. Simply put, it’s all fun and games until it’s not. And at that moment, if we play our cards just right, a good game algorithm can quite literally change the world.

Python 回顾

Python Review

打印

Print

变量

Variables

输入

Input

数学运算符

Math Operators

If/Else 条件

If/Else Conditionals

For 循环

For Loops

While 循环

While Loops

随机数

Random Numbers

列表

Lists

数组

Arrays

功能

Functions

打印命令

THE PRINT COMMAND

  • 1. 无论您输入什么, PRINT都会在屏幕上重复。
    • 打印(“你好,我是一台不起眼的 Chromebook。”)
    • 打印(“我希望我是一台 Mac。”)
  • 1. PRINT will repeat on the screen whatever you type.
    • print ("Hello, I am a lowly Chromebook.")
    • print ("I wish I were a Mac.")
  • 2. PRINT也可以用来打印空白行。
    • 打印 ()
  • 2. PRINT can also be used to print a blank line.
    • print ()
  • 3. PRINT可以告诉我们变量的当前值。
    • 我的年龄= 97
    • 打印(我的年龄)
  • 3. PRINT can tell us the current value of a variable.
    • myAge = 97
    • print (myAge)
  • 4. PRINT可以告诉我们数学问题的答案。
    • 打印(5*3 + 10)
  • 4. And PRINT can tell us the answer to a math problem.
    • print (5*3 + 10)
  • 5. PRINT语句结束后,Python 会自动跳到下一行。要覆盖该默认设置,请将行尾定义为空。
    • print ("这将打印为一个 ", end = "")
    • 打印(“行。”)
  • 5. After a PRINT statement, Python automatically moves to the next line. To override that default, define the end of the line to be nothing.
    • print ("This will print as one ", end="")
    • print ("line.")
  • 6. 可以使用 % 符号来保留变量值的空间。%d 表示数字,%s 表示字母,%f 表示小数。
    • 收藏号码= 3
    • print ("如果你掷出 %d,你就赢了。" % favoriteNumber)
    • 最喜欢的食物= “意大利面”
    • print ("不要吃我的 %s。" % favoriteFood)
  • 6. You can use the % symbol to hold space for the value of a variable. Use %d for numbers, %s for letters, and %f for decimals.
    • favoriteNumber = 3
    • print ("If you roll a %d, you win." % favoriteNumber)
    • favoriteFood = "pasta"
    • print ("Do not eat my %s." % favoriteFood)

使用变量存储信息

USING VARIABLES TO STORE INFORMATION

  • 1. 变量就像计算机里的邮箱。给它们命名,然后用它们来存储字母、数字或单词。
    • 我的年龄= 23
    • 我的名字= “斯坦”
  • 1. Variables are like mailboxes inside the computer. Name them and then use them to store letters, numbers, or words.
    • myAge = 23
    • myName = "Stan"
  • 2.给变量赋值时,变量名在左边。
    • 我的答案= 5 + 5
  • 2. When assigning a value to a variable, the variable name goes on the left.
    • myAnswer = 5 + 5
  • 3. 变量几乎可以容纳任何东西,包括字母、短语、数字,甚至是 True/False 标志。
    • 问候语= “你好,世界。”
    • 忏悔= “我真的只是一台不起眼的 Chromebook。”
    • 计数器= 10
    • 价格= 2.58
    • 游戏结束= False
  • 3. Variables can hold almost anything, including letters, phrases, numbers, and even True/False flags.
    • greeting = "Hello, world."
    • confession = "I really am just a lowly Chromebook."
    • counter = 10
    • price = 2.58
    • gameOver = False
  • 4. 您可以更改变量中存储的内容。
    • 秘密代码= 14
    • 秘密代码= 15
    • 打印(秘密代码)
  • 4. You can change what is stored in a variable.
    • secretCode = 14
    • secretCode = 15
    • print (secretCode)
  • 5. 您可以使用变量进行加、减等运算。
    • 计数器= 1
    • 计数器=计数器+ 5
    • 打印(计数器)
  • 5. You can add, subtract, and the like using variables.
    • counter = 1
    • counter = counter + 5
    • print (counter)

输入命令

THE INPUT COMMAND

  • 1. 您可以要求用户输入一个字母或单词,然后将该信息存储在变量中,以便您以后使用它。
    • yourName = input(“您的名字是什么?”)
    • 打印(“很高兴认识你,%s。”%你的名字)
  • 1. You can ask the user to INPUT a letter or word and then store that information in a variable so that you can use it later.
    • yourName = input("What is your first name? ")
    • print ("Nice to meet you, %s." % yourName)
  • 2. 你也可以请求一个数字。如果这样做,请务必告诉计算机该信息应该存储为整数或浮点数,而不是字母或单词。
    • wins = int(input("你赢了多少轮?"))
    • cost = float(input("你花了多少钱?"))
    • print ("您每次获胜的成本是...", end = "")
    • 打印(“$%2.2f。” %(成本/胜利))
  • 2. You can also ask for a number. If you do, make sure to tell the computer that the information should be stored as, say, an integer or floating-point decimal, not a letter or word.
    • wins = int(input("How many rounds did you win? "))
    • cost = float(input("How much did you spend? "))
    • print ("Your cost per win was . . .", end = "")
    • print ("$%2.2f." % (cost/wins))
  • 3. 您还可以使用INPUT来暂停程序。这在与用户交互时以及调试代码时非常有用。
    • 打印(“我正在思考一个数字。”)
    • 打印(“按 ENTER 键查看。”)
    • 暂停=输入(“”)
    • 打印 ()
    • 打印(“我的号码是 27!”)
  • 3. You can also use INPUT to cause your program to pause. That can be helpful during an interaction with the user, and also when you are trying to debug code.
    • print ("I am thinking of a number.")
    • print ("Hit ENTER to see it.")
    • pause = input("")
    • print ()
    • print ("My number is 27!")
  • 4. 在每个问题的末尾留一个空格,以便于输出结果的阅读。
    • answer = input(“你看到这里的问题了吗?”)
    • answer = input(“这不是好多了吗?”)
  • 4. Put a blank space at the end of each question, so the output is easy to read.
    • answer = input("Do you see the problem here?")
    • answer = input("And isn't this much better? ")
  • 5. 或者将问题写在一行,将答案写在下一行。
    • print("请输入您最喜欢的食物的名称。")
    • 选择=输入(“”)
  • 5. Or put the question on one line and the answer on the next.
    • print ("Please type the name of your favorite food.")
    • choice = input ("")

使用数学运算符

USING MATH OPERATORS

  • 1. Python 可以进行数学运算。
    • numOne = int(input("选择一个整数:"))
    • numTwo = int(input("选择另一个:"))
    • 总数= numOne + numTwo
    • 差值= numOne – numTwo
    • 产品=数量一 * 数量二
    • = numOne/numTwo
    • 平方= numOne ** 2
    • 打印(“%d +%d =%d”%(numOne,numTwo,total))
    • 打印(“%d - %d =%d”%(numOne,numTwo,差异))
    • 打印(“%d * %d =%d”%(numOne,numTwo,product))
    • 打印(“%d / %d =%.2f”%(numOne,numTwo,商))
    • print ("%d 平方等于 %d" % (numOne, squared))
  • 1. Python can do math.
    • numOne = int(input("Pick a whole number: "))
    • numTwo = int(input("Pick another one: "))
    • total = numOne + numTwo
    • difference = numOne – numTwo
    • product = numOne * numTwo
    • quotient = numOne/numTwo
    • squared = numOne ** 2
    • print ("%d + %d = %d" % (numOne, numTwo, total))
    • print ("%d – %d = %d" % (numOne, numTwo, difference))
    • print ("%d * %d = %d" % (numOne, numTwo, product))
    • print ("%d / %d = %.2f" % (numOne, numTwo, quotient))
    • print ("%d squared is %d" % (numOne, squared))
  • 2. 模运算符 (%) 表示计算机将一个数除以另一个数后剩下的余数。你经常用它来区分奇数和偶数。
    • number = int(input("什么数字?"))
    • 打印(“如果你的数字是偶数,”)
    • print ("我将在下面打印一个‘0’。")
    • print ("否则,我将打印‘1’。")
    • 打印(数字%2)
  • 2. The modulo operator (%) reports the remainder that is left when the computer divides one number by another. You will often use this to distinguish odd from even numbers.
    • number = int(input("What number? "))
    • print ("If your number is even,")
    • print ("I will print a '0' below.")
    • print ("Otherwise, I will print a '1'.")
    • print (number % 2)

使用IFELSE进行决策

DECISIONS WITH IF AND ELSE

  • 1. 你可以告诉计算机仅在特定条件成立时运行特定代码。IF语句以冒号结尾,并且要运行的代码必须缩进。
    • yourAge = int(input("你多大了?"))
    • 如果你的年龄> 21:
      • print(“您已超过21岁!”)
  • 1. You can tell the computer to run particular code only if a certain condition is true. The IF statement ends in a colon, and the code you want to run must be indented.
    • yourAge = int(input("How old are you? "))
    • if yourAge > 21:
      • print ("You are more than 21 years old!")
  • 2. 计算机知道很多比较物品的方法。
    • a = int(input("输入一个数字。"))
    • b = int(input("输入另一个数字。"))
    • 如果 a == b:
      • print ("你的数字是一样的。真无聊。")
    • 如果 a ! = b:
      • 打印(“您的数字不同。”)
    • 如果 a > b:
      • 打印(“你的第一个数字更大。”)
    • 如果 a < b:
      • 打印(“你的第一个数字较小。”)
    • 如果 a < = b:
      • 打印(“小于,还是等于?我不确定。”)
    • 如果 a > = b:
      • 打印(“更大,还是相等?我不确定。”)
  • 2. The computer knows a lot of ways to compare items.
    • a = int(input("Type a number. "))
    • b = int(input("Type another number. "))
    • if a == b:
      • print ("Your numbers are the same. Boring.")
    • if a != b:
      • print ("Your numbers are different.")
    • if a > b:
      • print ("Your first number was bigger.")
    • if a < b:
      • print ("Your first number was smaller.")
    • if a <= b:
      • print ("Smaller, or equal? I'm not sure.")
    • if a >= b:
      • print ("Bigger, or equal? I'm not sure.")
  • 3. 您还可以使用连接词ANDOR
    • a = int(input("输入一个数字。"))
    • b = int(input("输入另一个数字。"))
    • c = int(input("输入第三个数字。"))
    • 如果 a == b 且 b == c:
      • print ("这三个数字都一样。真无聊。")
    • 如果 a == 3 或 b == 3 或 c == 3:
      • print ("你的数字中至少有一个是 3。")
  • 3. You can also use the connecting words AND and OR.
    • a = int(input("Type a number. "))
    • b = int(input("Type another number. "))
    • c = int(input("Type a third number. "))
    • if a == b and b == c:
      • print ("All three numbers are the same. Boring.")
    • if a == 3 or b == 3 or c == 3:
      • print ("At least one of your numbers is a 3.")
  • 4. 您可以使用IF/ELSE组合编写更复杂的条件语句。请特别注意缩进。每个IF/ELSE对必须垂直对齐。
    • a = int(input("输入一个数字。"))
    • b = int(input("输入另一个数字。"))
    • c = int(input("输入第三个数字。"))
    • 如果 a == b:
      • print ("你的前两个数字相同。")
      • print ("让我检查一下第三个数字。")
      • 如果 b == c:
        • 打印(“哇。你缺乏创造力,真让我吃惊。”)
      • 别的:
        • 打印(“不。那个是不同的。”)
    • 别的:
      • print ("您的前两个数字不一样。")
      • 打印(“我甚至懒得看第三个。”)
  • 4. You can write more complicated conditional statements by using the combination IF/ELSE. Pay close attention to your indentations. Each IF/ELSE pair must line up vertically.
    • a = int(input("Type a number. "))
    • b = int(input("Type another number. "))
    • c = int(input("Type a third number. "))
    • if a == b:
      • print ("Your first two numbers are the same.")
      • print ("Let me check the third number.")
      • if b == c:
        • print ("Wow. Your lack of creativity amazes me.")
      • else:
        • print ("Nope. That one is different.")
    • else:
      • print ("Your first two numbers are not the same.")
      • print ("I will not even bother to look at the third.")
  • 5. 可以将ELSEIF组合起来,形成缩写ELIF
    • mathScore = int(input("你考试成绩怎么样?"))
    • 如果数学分数> 90:
      • 打印(“超过 90!哇。”)
    • elif 数学分数> 80:
      • 打印(“介于 80 和 90 之间。”)
      • 打印(“你做得很好。”)
    • 别的:
      • 打印(“低于 80。艰难的一天。”)
  • 5. You can combine ELSE with IF to make the contraction ELIF.
    • mathScore = int(input("How did you do on the test? "))
    • if mathScore > 90:
      • print ("More than 90! Wow.")
    • elif mathScore > 80:
      • print ("Between 80 and 90.")
      • print ("You did well.")
    • else:
      • print ("Below 80. Rough day.")
  • 6.Python 也可以比较单词和字母。
    • name = input(“您的名字是什么?”)
    • 如果名称<“Python”:
      • print ("按字母顺序排在 Python 之前。")
    • elif 名称== “Python”:
      • print(“你的名字是 Python?真的吗?!?!”)
  • 6. Python can compare words and letters, too.
    • name = input("What is your first name? ")
    • if name < "Python":
      • print ("That comes before Python alphabetically.")
    • elif name == "Python":
      • print ("Your name is Python? Really?!?!?")

使用FOR循环进行重复和计数

REPEATING AND COUNTING WITH FOR LOOPS

  • 1.使用FOR循环重复代码。
    • 对于范围内的计数器(10):
      • 打印(“勾选”)
    • 打印(“BOOM!”)
  • 1. Use a FOR loop to repeat code.
    • for counter in range(10):
      • print("tick")
    • print ("BOOM!")
  • 2. 或者使用FOR循环进行计数。记住,计算机从零开始计数。
    • 对于范围内的计数器(10):
      • 打印(计数器)
  • 2. Or use a FOR loop to count. Remember, computers start counting at zero.
    • for counter in range(10):
      • print (counter)
  • 3. 您可以设置起始数字、结束数字,甚至每次循环时增加或减少的数量。
    • 打印(“现在,让我们欢呼!”)
    • 在范围(2,9,2)内欢呼:
      • 打印(欢呼)
  • 3. You can set the start number, the end number, and even the amount that will be added or subtracted each time through the loop.
    • print ("Now, let's cheer!")
    • for cheer in range(2,9,2):
      • print (cheer)
  • 4. 可以将一个循环嵌套在另一个循环中。这里,代码对十行中的每一行迭代五个座位。
    • 对于范围内的行(10):
      • 对于范围内的座位(5):
        • 打印(座位数+ 1,结束= “”)
      • 打印 ()
  • 4. You can nest one loop inside another loop. Here, for each of ten rows, the code iterates through five seats.
    • for rows in range(10):
      • for seats in range(5):
        • print (seats+1, end=" ")
      • print ()
  • 5.您甚至可以将FOR循环与字符串一起使用。
    • name = input(“你叫什么名字?”)
    • 打印(“我会数一数你名字中的字母。”)
    • 计数器= 1
    • 姓名中的字母:
      • 计数器=计数器+ 1
    • 打印(计数器)
  • 5. You can even use a FOR loop with a string.
    • name = input("What is your name? ")
    • print ("I will count the letters in your name.")
    • counter = 1
    • for letter in name:
      • counter = counter + 1
    • print (counter)

使用WHILE循环进行重复和计数

REPEATING AND COUNTING WITH WHILE LOOPS

  • 1.当您知道要循环多少次时,请使用FOR循环,但当您想一直循环直到发生特定事件时, 请使用WHILE循环。
    • answer = input("输入一个大写字母。")
    • while (答案 > “Z” 或 答案 < “A”):
      • 打印(“那不是一个大写字母。”)
      • 打印(“让我们再试一次。”)
      • answer = input("输入一个大写字母。")
  • 1. Use a FOR loop when you know how many times you want to loop, but use a WHILE loop when you want to keep looping until something specific happens.
    • answer = input("Type a capital letter. ")
    • while (answer > "Z" or answer < "A"):
      • print ("That was not a capital letter.")
      • print ("Let's try again.")
      • answer = input("Type a capital letter. ")
  • 2. 这段代码如果用FOR循环来写会比较麻烦,但是用WHILE来写就很容易了。
    • 目标= 21
    • 总计= 0
    • 当总数<目标时:
      • roll = int(input("你掷出了什么数字?"))
      • 总数=总数+掷骰子
      • 打印(“到目前为止,总数为 %d。” % 总计)
    • 打印(“完成!”)
  • 2. This code would be awkward to write with a FOR loop, but it is easy to write using WHILE.
    • target = 21
    • total = 0
    • while total < target:
      • roll = int(input("What number did you roll? "))
      • total = total + roll
      • print ("That gives you a total so far of %d." % total)
    • print ("Finished!")
  • 3. 真/假标志通常用于WHILE循环中。
    • 完成=
    • 计数器= 0
    • 尚未完成:
      • 打印(“仍在工作。”)
      • 计数器=计数器+ 1
      • 如果计数器== 3:
        • 完成=
    • 打印(计数器)
  • 3. True/False flags are often used in WHILE loops.
    • finished = False
    • counter = 0
    • while not finished:
      • print ("Still working.")
      • counter = counter + 1
      • if counter == 3:
        • finished = True
    • print (counter)

生成随机

GENERATING RANDOM NUMBERS

  • 1. 要使用随机数,请导入内置库RANDOM。库只是一种教计算​​机新功能的资源。
  • 1. To use random numbers, import the built-in library RANDOM. A library is just a resource that teaches the computer new functions.
  • 2. 使用该库,我们可以使用下面显示的命令请求一个随机整数。
    • 随机导入
    • diceRoll = random.randint(1,6)
    • secondRoll =随机.randint(1,6)
    • print ("你得到了 %d 和 %d。" % (diceRoll, secondRoll))
  • 2. With that library, we can ask for a random integer using the command shown below.
    • import random
    • diceRoll = random.randint(1,6)
    • secondRoll = random.randint(1,6)
    • print ("You got a %d and a %d." % (diceRoll, secondRoll))
  • 3.这是另一个例子。
    • 随机导入
    • randomNum = random.randint(1,100)
    • print ("我正在考虑一个介于 1 到 100 之间的数字。")
    • guess = int(input("你猜多少?"))
    • print ("我的号码是 %d。" % randomNum)
    • 打印(“你猜对了吗?”)
  • 3. Here's another example.
    • import random
    • randomNum = random.randint(1,100)
    • print ("I am thinking of a number between 1 and 100.")
    • guess = int(input("What's your guess? "))
    • print ("My number was %d." % randomNum)
    • print ("Did you guess it?")
  • 4.我们还可以要求0到1之间的随机小数。
    • 随机导入
    • 随机数= random.random()
    • print("这是一个随机小数:%0.2f。" % randomNum)
  • 4. We can also ask for a random decimal between 0 and 1.
    • import random
    • randomNum = random.random()
    • print("Here is a random decimal: %0.2f." % randomNum)

将变量分组到列表中

GROUPING VARIABLES INTO LISTS

  • 1. 到目前为止,我们一直将信息存储在单独命名的变量中。
    • 我的名字= “Inigo Montoya”
    • howManyFingers = 6
  • 1. So far, we have been storing information in separately named variables.
    • myName = "Inigo Montoya"
    • howManyFingers = 6
  • 2. 然而,有时我们需要创建大量相关变量,并且不想把所有变量都输入出来。在这种情况下,我们可以使用列表。要创建列表,请使用逗号和括号键入项目,如下所示。
    • stateList = [“俄亥俄州”,“爱荷华州”,“北达科他州”,“德克萨斯州”]
  • 2. Sometimes, however, we need to create a large number of related variables, and we do not want to type everything out. In those situations, we can use a list. To create one, type the items using commas and brackets, as shown.
    • stateList = ["Ohio", "Iowa", "North Dakota", "Texas"]
  • 3. 或者,从一个空列表开始,然后逐个添加项目。
    • 状态列表= []
    • stateList.append(“俄亥俄州”)
    • stateList.append(“蒙大拿州”)
    • stateList.append(“北达科他州”)
    • stateList.append(“南达科他州”)
    • stateList.append(“德克萨斯州”)
  • 3. Alternatively, start with an empty list and then add items one by one.
    • stateList = []
    • stateList.append("Ohio")
    • stateList.append("Montana")
    • stateList.append("North Dakota")
    • stateList.append("South Dakota")
    • stateList.append("Texas")
  • 4. 我们可以打印完整的清单……
    • 打印(状态列表)
  • 4. We can print our list in full . . .
    • print (stateList)
  • 5. 或者,我们可以一次访问列表中的一个项目。但要注意:列表中的第一个项目是 [0],而不是 [1]。
    • 打印(stateList[0])
    • 打印(stateList[3])
  • 5. Or we can access our list one item at a time. But be careful: the first item in a list is item [0], not item [1].
    • print(stateList[0])
    • print(stateList[3])
  • 6.我们甚至可以使用循环和变量访问列表。
    • 列表长度= len(状态列表)
    • 对于范围内的计数器(listLength):
      • 打印(状态列表[计数器])
  • 6. We can even access lists using loops and variables.
    • listLength = len(stateList)
    • for counter in range(listLength):
      • print (stateList[counter])
  • 7.我们可以替换或删除列表中的项目。
    • dwarfList = ["瞌睡", "脾气暴躁", "医生", "打喷嚏", "懒惰"]
    • dwarfList[3] = "害羞"
    • dwarfList.remove(“Lazy”)
    • 打印(dwarfList)
  • 7. We can replace and remove items from a list.
    • dwarfList = ["Sleepy", "Grumpy", "Doc", "Sneezy", "Lazy"]
    • dwarfList[3] = "Bashful"
    • dwarfList.remove("Lazy")
    • print (dwarfList)
  • 8. 在上面的例子中,我们的索引号是任意的。例如,dwarfList[1] 是 Grumpy 而不是 Doc,这并没有什么神奇之处。但有时索引确实有意义。例如,下面,自旋数 2 存储在 spins[2] 中,自旋数 5 存储在 spins[5] 中。
    • 随机导入
    • # 准备列表
    • 旋转= []
    • 对于范围内的计数器(7):
      • 旋转.附加(0)
    • # 旋转转盘 100 次
    • 对于范围内的计数器(100):
      • 自旋=随机.randint(1,6)
      • 自旋[旋转] =自旋[旋转] + 1
    • # 报告结果
    • 打印(“我旋转了 100 次。”)
    • 打印(“发生了什么:”)
    • 对于范围内的计数(6):
      • print ("我旋转了 %d . ." % (count + 1), end = "")
      • 打印(“ %2d 次。” % 旋转[count + 1])
  • 8. In the examples above, our index numbers were arbitrary. For instance, there was nothing magical about dwarfList[1] being Grumpy as opposed to Doc. But sometimes an index has real meaning. Below, for instance, a spin of 2 is stored in spins[2], and a spin of 5 is stored in spins[5].
    • import random
    • # prepare the list
    • spins = []
    • for counter in range(7):
      • spins.append(0)
    • # spin the spinner 100 times
    • for counter in range(100):
      • spin = random.randint(1,6)
      • spins[spin] = spins[spin] + 1
    • # report the results
    • print ("I spun a spinner 100 times.")
    • print ("Here's what happened:")
    • for count in range(6):
      • print ("I spun %d . ." % (count+1), end="")
      • print ("%2d times." % spins[count+1])

将列表分组为数组

GROUPING LISTS INTO ARRAYS

  • 1. 我们可以将数组视为列表的列表。例如,体育场可以定义为行列表,其中每行都是座位列表。
    • 席位= []
    • 对于范围内的行(10):
      • 座位.append(["1","2","3","4","5","6"])
  • 1. We can think of an array as a list of lists. A sports stadium, for instance, might be defined as a list of rows, where each row is then a list of seats.
    • seats = []
    • for rows in range(10):
      • seats.append(["1","2","3","4","5","6"])
  • 2. 我们可以逐个座位地打印该数组。只需使用两个索引,第一个索引表示行号,第二个索引表示座位号。
    • 打印(“这是我们的座位图。”)
    • 对于范围内的行(10):
      • 对于范围内的列(6):
        • 打印(座位[行][列],结束= “”)
      • 打印 ()
  • 2. We can print that array seat by seat. Just use two indexes, the first for the row number and the second for the seat number.
    • print ("Here's our seat map.")
    • for rows in range(10):
      • for columns in range(6):
        • print (seats[rows][columns], end=" ")
      • print ()
  • 3. 以正常方式更改数组值,将每个项目视为唯一的独立变量。
    • 座位[0][0] = "A"
    • 座位[1][0] = "B"
    • 座位[2][0] = "C"
    • 座位[3][0] = "D"
  • 3. Change array values the normal way, treating each item as a unique, independent variable.
    • seats[0][0] = "A"
    • seats[1][0] = "B"
    • seats[2][0] = "C"
    • seats[3][0] = "D"

定义函数

DEFINING FUNCTIONS

  • 1. Python 拥有大量的标准函数,例如PRINTINPUT。然而,有时你可能需要定义自己的函数,例如本质上是创建新的命令,然后您可以在自己的程序中使用这些命令。以下示例展示了格式,包括所需的缩进。
    • def myFunction():
      • print ("来自函数内部的问候。")
      • 打印(“你好吗?”)
  • 1. Python knows a large number of standard functions like PRINT and INPUT. Sometimes, however, you will want to define your own functions, in essence creating new commands that you can then use in your own program. The below example shows you the format, including the required indentation.
    • def myFunction():
      • print ("Hello from inside the function.")
      • print ("How are you?")
  • 2. 函数通常会接受函数随后要使用的信息。对于像PRINT这样的内置函数来说,情况就是这样,它接受关于在屏幕上显示什么内容的详细信息。对于您编写的函数来说,情况也是如此。
    • # 定义一个新函数
    • def 平方(值):
      • 打印(值*值)
    • # 然后在主程序中使用你的函数
    • number = int(input("输入一个数字: "))
    • print ("你的数字的平方是...", end = "")
    • 平方(数字)
  • 2. Functions often accept information that the function will then use. This is true for built-in functions like PRINT, which accepts details about what to display on the screen. It can be true for functions you write, too.
    • # define a new function
    • def square(value):
      • print (value*value)
    • # then use your function in the main program
    • number = int(input("Type a number: "))
    • print ("The square of your number is . . .", end="")
    • square(number)
  • 3. 需要传递大量信息吗?没问题。
    • def 平均值(分数 1,分数 2,分数 3):
      • 平均值= (分数 1 +分数 2 +分数 3)/3
      • 打印(“%2.1f”%平均值)
    • firstTest = int(input("你的第一个分数是多少?"))
    • secondTest = int(input("你的第二个是什么?"))
    • thirdTest = int(input("你的第三个呢?"))
    • print ("你的平均数是 ", end = "")
    • 平均值(第一次测试,第二次测试,第三次测试)
  • 3. Need to pass in a lot of information? No problem.
    • def average (score1, score2, score3):
      • averageValue = (score1 + score2 + score3)/3
      • print ("%2.1f" % averageValue)
    • firstTest = int(input("What was your first score? "))
    • secondTest = int(input("What was your second? "))
    • thirdTest = int(input("And your third? "))
    • print ("Your average is ", end="")
    • average(firstTest, secondTest, thirdTest)
  • 4. 你甚至可以传入列表或数组。但是,如果这样做,请务必小心。如果你在函数内部更改列表或数组,则可能会在函数外部意外更改它。为了尽量减少混淆,我总是在编辑任何传入的列表或数组之前创建第二个副本。这样,我就能始终确保我的函数更改的是副本,而不是原始副本。
    • def regrade(分数):
      • # 创建新列表
      • 实验= []
      • 对于范围内的计数器(len(分数)):
        • 实验.附加(分数[计数器])
      • # 编辑新列表
      • 实验[0] = 100
      • 打印(“如果你获得了完美的第一次测试,”)
      • print ("你的分数将是: ", end = "")
      • 打印(实验)
    • # 主程序
    • 分数= [80, 90, 95]
    • 重新评分(分数)
    • print ("但不幸的是,你实际上得分了:",end = "")
    • 打印(分数)
  • 4. You can even pass in a list or an array. If you do, however, be careful. If you change the list or array inside the function, you might accidentally change it outside the function, too. To minimize confusion, I always make a second copy of any passed-in list or array before editing it. That way, I can always be sure that my functions are changing the copy, not the original.
    • def regrade(scores):
      • # make a new list
      • experiment = []
      • for counter in range(len(scores)):
        • experiment.append(scores[counter])
      • # edit that new list
      • experiment[0] = 100
      • print ("If you had earned a perfect first test,")
      • print ("your scores would be: ", end="")
      • print (experiment)
    • # the main program
    • scores = [80, 90, 95]
    • regrade(scores)
    • print ("But sadly, you actually scored: ", end="")
    • print(scores)
  • 5. 到目前为止的示例仅将信息打印到屏幕上。但函数也可以将信息传回,以便在程序的其他地方使用。
    • def 平均值(分数 1,分数 2,分数 3):
      • 平均值= (分数 1 +分数 2 +分数 3)/3
      • 返回(平均值)
    • firstTest = int(input("你的第一个分数是多少?"))
    • secondTest = int(input("你的第二个是什么?"))
    • thirdTest = int(input("你的第三个呢?"))
    • 你的平均值=平均值(第一个测试,第二个测试,第三个测试)
    • 打印(“您的平均成绩为%2.2f。”%yourAverage)
  • 5. The examples thus far only print information to the screen. But a function can also pass information back so that it can be used elsewhere in the program.
    • def average (score1, score2, score3):
      • averageValue = (score1 + score2 + score3)/3
      • return (averageValue)
    • firstTest = int(input("What was your first score? "))
    • secondTest = int(input("What was your second? "))
    • thirdTest = int(input("And your third? "))
    • yourAverage = average(firstTest, secondTest, thirdTest)
    • print ("Your average was %2.2f." % yourAverage)

参考

References

我写这本书的目的是,将计算机科学中一些最引人入胜的概念介绍给那些可能还没有准备好在更正式的编码文献中阅读这些概念的读者。因此,我刻意减少了专业术语的使用,避免了高等数学,避开了高级编码结构,甚至允许代码中出现一些效率低下的小问题,尽管这样做是为了提高代码的可读性。总而言之,在本书的最后,我想将本书的内容与那些不太容易理解的文献联系起来,以便感兴趣的读者在想要了解更多信息时知道该去哪里阅读。

My goal in writing this book was to introduce some of computer science’s most engaging concepts to an audience that might not yet be ready to read about those ideas in the more formal coding literature. Thus, I have intentionally minimized my use of technical jargon, avoided advanced mathematics, steered clear of advanced coding structures, and even allowed minor inefficiencies to creep into my code when doing so promised to make the code more readable. That said, in this closing note, I want to tie this book’s substance into that less accessible literature so that interested readers will know where to turn as they want to learn more.

第一章以二分查找的例子开篇,这是一种强大的搜索算法,通常在计算机科学入门课程中讲授。我使用二分查找来展示消元算法的强大功能,然后重点介绍一种特别有趣的消元算法:一种为游戏Wordle量身定制的算法。二分查找和消元算法显然还有许多其他用途。其中一个著名的例子是游戏Mastermind,许多顶尖的计算机科学家都研究过这款游戏,其中包括传奇数学家Donald Knuth,据说他开创了整个算法设计领域。

Chapter 1 opens with the example of binary search, which is a powerful search algorithm commonly taught in introductory computer science courses. I use binary search to show the power of elimination algorithms, and then I focus on one elimination algorithm of particular interest: an algorithm tailored to the game Wordle. There are obviously many other uses for both binary search and elimination. One famous example is the game Mastermind, which has been studied by many leading computer scientists including the legendary Donald Knuth, a mathematician who by most accounts launched the entire field of algorithmic design.

我在第二章和第三章的讨论介绍了四个激动人心的基础概念:回溯、深度优先搜索、广度优先搜索和递归。下一步应该关注更高级的寻路算法,例如 Dijkstra 算法和 A* 算法,它们在寻找最短路径方面都具有重要的优势,而且总体时间和处理成本也比较合理。在这方面,也有一些值得思考的工作,即如何评估使用一种算法与另一种算法的成本。一个好的起点是使用一种通常被称为“大 O 符号”的测量技术。

My discussions in chapters 2 and 3 introduce four exciting and foundational concepts: backtracking, depth-first search, breadth-first search, and recursion. A good next step would be to focus on more advanced pathfinding algorithms like Dijkstra’s algorithm and A*, each of which offers important advantages when it comes to finding the shortest path but at reasonable overall time and processing costs. Here, too, there is worthwhile work to be done in thinking about how to evaluate the costs of using one algorithm versus another. A good starting place is a measuring technique conventionally called Big O notation.

第四章以 Nim 硬币游戏开篇,并以此游戏解释了极小极大值的必要性和内部工作原理。从这里开始,第四章和第五章正确地抓住了推动递归极小极大值的主要见解,唯一真正的例外是我使用两个函数(FIND B EST M OVE()WHAT H APPENS())实现极小极大值,而更复杂的演示通常在一个函数中完成所有工作。第六章随后介绍了文献中所说的 alpha-beta 剪枝。总之,所有这些都使读者能够应对这些不同技术的更严格的处理。一个好的起点是迭代深化,这是我们在第 6 章的总结性挑战中开始探索的策略之一,其中计算机本质上使用早期搜索来策略性地对后续搜索进行排序。

Chapter 4 opens with the coin game Nim and uses that game to explain both the need for, and the inner workings of, minimax. From there, chapters 4 and 5 correctly capture the main insights that drive recursive minimax, the only real exception being that I implement minimax using two functions (FINDBESTMOVE() and WHATHAPPENS()) whereas more sophisticated presentations typically do all the work within one. Chapter 6 then introduces what the literature refers to as alpha-beta pruning. Taken together, all of that positions readers to tackle more rigorous treatments of these various techniques. A good place to start is iterative deepening, which is one of the strategies we begin to explore in chapter 6’s concluding challenge, where the computer in essence uses early searches to strategically sequence later ones.

第七、八、九章戏谑地将计算机科学家所谓的蒙特卡洛模拟和蒙特卡洛树搜索描述为“飞镖”。更高级的变体使用了更先进的数据结构和更复杂的数学方法,但它们遵循我在此概述的概念路径。同时,我用来开启第七章的小插图灵感来自于一个著名的演示,该演示使用蒙特卡洛方法估算几何常数π的值。我认为估算曲线的面积并将飞镖交给几个喝醉的数学家更令人难忘。

Chapters 7, 8, and 9 playfully describe as “darts” what computer scientists would call Monte Carlo simulation and Monte Carlo Tree Search. More advanced variations use better data structures and significantly more sophisticated mathematics, but they follow the conceptual path I sketch here. The vignette I use to open chapter 7, meanwhile, is inspired by a famous demonstration that uses Monte Carlo methods to estimate the value of the geometry constant pi. I thought it more memorable to instead estimate the area of a squiggle and to hand some darts to a few drunken mathematicians.

第十章介绍了一种正式称为马尔可夫链的编码结构。我的版本使用相对简单的列表构建;传统的版本则使用更优雅的数据结构,例如类和字典。对于后续步骤,有很多关于如何使用“隐”马尔可夫模型检测细微数据模式的精彩文献。许多语音识别都是通过这种方式完成的,而且该技术在应用于各种遗传和医学数据方面也展现出良好的前景。

Chapter 10 introduces a coding construct that is formally known as a Markov chain. My version is built using relatively simple lists; traditional versions are built using more elegant data structures like classes and dictionaries. For next steps, there is a wonderful literature on how to use “hidden” Markov models to detect subtle data patterns. A lot of speech recognition is done this way, and the technique has also shown promise as applied to all sorts of genetic and medical data.

第11章真正构建了一个神经网络,尽管没有复杂的数学知识。下一步是阅读关于偏差值的内容,也就是我在本章末尾提到的那些数字,它们可以添加到我的CRAZY M ATH()公式中,本质上就是设置阈值,低于或高于该阈值的节点将被忽略。还有无数的论文深入探讨网络设计问题,提出了一些想法,例如如何决定一个给定的网络应该有两个中间步骤还是三个,每个步骤应该用五个节点、十个节点还是其他数量来表示,甚至如何设置所有必要权重的起始值。

Chapter 11 then really builds a neural network, albeit without the gnarly math. A great next step here would be to read about bias values, which are the numbers I mention at the end of the chapter that can be added to my CRAZYMATH() formulas to, in essence, set thresholds below or above which certain nodes will be ignored. There are also countless papers that dig into questions of network design, offering ideas about how to decide whether a given network should have two intermediate steps versus three, and whether each step should be represented by five nodes, or ten, or something else, and even how to set starting values for all of the necessary weights.

第十二章以一项名为反事实遗憾最小化的机器学习技术作为结尾。目前已有数十篇研究论文探讨了该方法的变体,因为该方法目前仍是一个非常活跃的研究课题。这其中的数学知识相当复杂,通常需要对统计学和线性代数有深入的理解才能处理即使是相对简单的应用。即便如此,这也是未来几年我们可以期待出现重大概念突破的众多领域之一。如果您恰好有所突破,请与我们联系,我一定会将其收录到本书的下一版中。

Chapter 12 concludes with a machine learning technique called counterfactual regret minimization. There are dozens of research papers exploring variations on this approach, as this one is still the subject of very active research. The math here gets pretty intense, typically requiring a rigorous understanding of both statistics and linear algebra in order to tackle even relatively simple applications. That said, this is one of many spaces where we can expect significant conceptual breakthroughs in the years ahead. If you happen to make one, reach out and I will be sure to include it in the book’s next edition.

指数

Index

荒诞,13

Absurdle, 13

算法的重要性,xi–xii161

Algorithms, importance of, xi–xii, 161

Alpha-beta 剪枝,178

Alpha-beta pruning, 178

回溯,15–18,22–23,26–27,29–32,177

Backtracking, 15–18, 22–23, 26–27, 29–32, 177

偏差值,178

Bias values, 178

大O符号,177

Big O notation, 177

二分查找,177

Binary search, 177

黑箱策略,xvi133–146

Black-box strategies, xvi, 133–146

杰克,88–105,107–108,110,113–117

Blackjack, 88–105, 107–108, 110, 113–117

广度优先搜索32–37,177

Breadth-first search, 32–37, 177

汽车导航系统,30

Car navigation systems, 30

ChatGPT,132

ChatGPT, 132

国际象棋xiv27,161

Chess, xiv, 27, 161

硬币游戏,41–44,46–47

Coin games, 41–44, 46–47

布洛托上校,153–160

Colonel Blotto, 153–160

棋,xiv50–73,79,107–120

Connect Four, xiv, 50–73, 79, 107–120

反事实遗憾最小147–160,178

Counterfactual regret minimization, 147–160, 178

决策树17、22、44-46、61、64、66-73、79、94-95、107-108、120。另请参阅修剪

Decision trees, 17, 22, 44–46, 61, 64, 66–73, 79, 94–95, 107–108, 120. See also Pruning

深蓝,161

Deep Blue, 161

DeepMind,161

DeepMind, 161

深度优先搜索32,34-37,56-59,68-72,81,120,177

Depth-first search, 32, 34–37, 56–59, 68–72, 81, 120, 177

Dijkstra算法,177

Dijkstra’s algorithm, 177

皇后,27,35

Eight Queens, 27, 35

消除算法3–13,177

Elimination algorithms, 3–13, 177

菲布尔,13岁

Fibble, 13

猜猜我是谁,5-6

Guess Who, 5–6

直观的规则/策略,xii

Intuitive rules/strategies, xii

迭代深化,178

Iterative deepening, 178

克努斯,唐纳德,177

Knuth, Donald, 177

语言模仿技术,132

Language imitation technologies, 132

机器学习xvi133–160,178

Machine learning, xvi, 133–160, 178

马尔可夫链,178

Markov chain, 178

主谋,177

Mastermind, 177

迷宫,xiii15–16,22–23,26,29

Mazes, xiii, 15–16, 22–23, 26, 29

极小极大,49,50,51,58–59,71,85–86,178

Minimax, 49, 50, 51, 58–59, 71, 85–86, 178

蒙特卡罗模拟,178

Monte Carlo simulation, 178

网络设计,178

Network design, 178

神经网络144–145,178

Neural networks, 144–145, 178

他,41–44,46–47,178

Nim, 41–44, 46–47, 178

模式识别xvi127–132,138,178

Pattern recognition, xvi, 127–132, 138, 178

中间选择策略,3-4

Pick-the-middle strategy, 3–4

剪枝,xiv - xv59-73,108,120,178 参阅决策树​

Pruning, xiv–xv, 59–73, 108, 120, 178. See also Decision trees

Python,评论,165–175

Python, review of, 165–175

随机模拟,xv78–86,91–94,107–108,120

Random simulation, xv, 78–86, 91–94, 107–108, 120

递归17–27,41–50,52–61,67,70–73,177​​

Recursion, 17–27, 41–50, 52–61, 67, 70–73, 177

遗憾,xvi147–160

Regret, xvi, 147–160

石头剪刀布,xv xvi ,123–131,134–156,158

Rock-paper-scissors, xv–xvi, 123–131, 134–156, 158

最短路径,xiii–xiv30–37

Shortest path, xiii–xiv, 30–37

Sigmoid函数,141–142

Sigmoid function, 141–142

模拟。参见 蒙特卡罗模拟随机模拟模拟的战略分配

Simulations. See Monte Carlo simulation; Random simulation; Strategic allocation of simulations

速度,最大化的方法,xiv-xv51-73

Speed, ways to maximize, xiv–xv, 51–73

模拟战略分配,xv87–89,94–105,107–120

Strategic allocation of simulations, xv, 87–89, 94–105, 107–120

独,十三16–18,24–26,35–36

Sudoku, xiii, 16–18, 24–26, 35–36

井字游戏, xi–xii , 44–49 , 51 , 58 , 107 , 147

Tic-tac-toe, xi–xii, 44–49, 51, 58, 107, 147

训练数据,xvi134–146

Training data, xvi, 134–146

回合制游戏,模拟,107–120

Turn-based games, simulations for, 107–120

2048 益智游戏),79–87,107

2048 (puzzle game), 79–87, 107

词阶梯,29–37

Word Ladder, 29–37

Wordle xiii6–13,177

Wordle, xiii, 6–13, 177